少点错误 前天 22:23
Red-Thing-Ism
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

一位植物学家以“红色事物”为研究主题,却发现这一分类无法揭示植物的真实功能和进化关系。文章指出,将“红色事物”作为研究主题存在误区,因为它将不同生物学概念混为一谈,忽视了进化、生态和生理等关键因素。作者强调,科学研究应基于正确的分类和术语,避免“红色事物主义”等错误思维模式,并以此类比AI安全研究中对“欺骗”等概念的模糊定义,呼吁研究者使用更精确的语言。

🌿 将“红色事物”作为研究主题,将不同生物学概念(如果实、花朵、叶片)混为一谈,忽视了它们各自的进化、生态和生理背景,导致研究结论缺乏科学依据。

🔬 科学研究应基于正确的分类和术语,使用进化、生态和生理等生物学语言,而非模糊的“红色事物主义”等错误思维模式,才能揭示自然现象的真实规律。

🤖 AI安全研究中也存在类似问题,对“欺骗”等概念的定义模糊,将不同行为模式(如输出错误信息、改变信念系统、操纵行为)混为一谈,阻碍了对AI对齐问题的深入理解。

🧠 避免使用模糊的术语和分类,转而使用精确的语言描述AI的行为和目标,是解决AI对齐问题的关键步骤,有助于推动AI安全研究的进展。

🔭 科学研究应注重逻辑推理和强推断,避免将不相关的事物强行归为一类,才能获得有价值的科学发现,推动知识的进步。

Published on July 31, 2025 2:09 PM GMT

I

A botanist sets out to study plants in the Amazon rainforest. For her first research project, she sets her sights on “red things”, so as not to stretch herself too far. She looks at red flowers and notices how hummingbirds drink their nectar; she studies red fruits and notices how parrots eat them.

She comes to some tentative hypotheses: perhaps red attracts birds. Then she notices the red undersides of the bushes in the undergrowth. Confusing! All she can say at the end of her project is that red tends to be caused by carotenoids, and also anthocyanins.

A researcher living in Canada reads her work. He looks out his window at the red maples of autumn. Ah, he says, carotenoids and/or anthocyanins. Makes sense. Science.

Of course, we can see that not much useful work has been done here. We don’t understand the function of fruit or nectar or the red undersides of leaves. We don’t understand the reasons why the chlorophyll is drained from leaves in the autumn, while the red pigments remain.

II

What went wrong? A few things, but the big problem here is that “red things” is not a good category to study in the context of plants. Sure, the fact that we can point out the category means they might have a few things in common, but we’re not speaking the native language of biology here. To make sense of biology, we must think in terms of evolution, ecology and physiology.

We have to talk about “fruits” as a class of things-which-cause-animals-to-distribute-seeds; we have to ask which animals might distribute the seeds best, and how to attract them. Flowers are similar but not quite the same. The red leaf undersides are totally different: they reflect the long-wavelengths of the light which reaches the jungle floor, sending it back up into the cells above and giving the chlorophyll a second chance to absorb it.

Red-thing-ism has both lumped together unrelated things (fruit, flowers, leaves) and split the true-categories by a (somewhat) unnatural boundary (red flowers, other flowers).

This example has never quite happened, but evolutionary biologists see it all the time! The order “insectivora” was constructed to contain hedgehogs, shrews, golden moles, and the so-called flying lemur. But this group didn’t evolve from a single common ancestor, many animals simply converged on the same lifestyle and body plan.

III

Evolutionary biologists have tightly restricted themselves to speak only the language of evolution itself, which is why they end up saying such deranged things as “whales are mammals” and “birds are dinosaurs” and “fish don’t exist”. From this they’ve managed to succeed in studying a process which has occurred over billions of unobservable years: the method works.

Platt emphasized strong inference as the hallmark of a successful field. I’ll identify another: avoiding red-thing-ism.

IV

Red-thing-ism is at its most common when there is little knowledge of the structure of the field. It’s especially common in AI safety research. A particular example is “deception”. Most people who talk about deception lump together several different things:

    Outputting text which does not represent the AI’s best model of the world, according to standard symbol-grounding rules.Modelling a person as an agent with a belief system and trying to shift that belief system away from your own best guess about the world.Modelling a person as a set of behaviours, and taking actions which cause them to do some output. Like feeding ants a mixture of borax and sugar.

In the language of AI and thinking systems, these are not the same thing. But even if we restrict our analysis to individual cases, such as 3, we run into a problem. Red-thing-ism doesn’t just lump unnaturally, it splits unnaturally. Part of the boundary of 3 is “I don’t like it”. The same is true for things like goal misgeneralization. What defines a mis-generalization apart from a good generalization is “I don’t like it”.

So far, “I don’t like it” has not been translated into the native language of AI and cognitive systems. Doing so is, in fact, a very large and very hard part of the alignment problem! There are a lot of topics being studied where the object of study is just assumed to be a meaningful category, but where the meaningfulness of that category requires a big chunk of alignment to have already been solved.

(another common error is one which sidesteps issues in symbol grounding, like the question of whether a given sequence of tokens "is false")

Once I started noticing this, I couldn’t stop seeing it. Whether an AI is "a schemer" is another example, but it applies to the gears of a lot of applied research. Many agendas in control, oversight, and probing look like “we want to reduce chain-of-thought obfuscation” or “we want to detect scheming”. These all appear a bit hollow and meaningless to me now. Oh dear!



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

科学研究 进化生物学 AI安全 对齐问题 分类方法
相关文章