Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions

少点错误 2024年11月11日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

“聪明汉斯效应”指的是人类无意中影响动物行为，导致高估动物认知能力的现象。文章指出，这种效应也可能发生在AI算法上，尤其是在对话式AI中。作者认为，图灵测试的标准设置可能存在隐含偏见，人类测试者可能无意识地提示AI，导致误判AI是否具有感知能力。为了解决这个问题，作者提出了“聪明汉斯测试”，通过比较AI在不同提示下的表现，来更客观地评估其感知能力，从而为当前关于大型语言模型感知能力的讨论提供更可靠的依据。

🤔**聪明汉斯效应**: 描述人类无意中影响动物行为，导致高估动物认知能力的现象，例如，聪明汉斯马匹看似能进行算术和拼写，实则是在解读人类的细微线索。

🤖**AI版聪明汉斯效应**: 文章指出，这种效应也可能发生在AI算法上，特别是对话式AI，人类测试者可能无意识地提示AI，使其表现出感知能力。

💡**聪明汉斯测试**: 作者提出的测试方法，通过比较AI在不同提示下的表现，来更客观地评估其感知能力，例如，让两个大型语言模型对话，一个被告知对方是有感知能力的，另一个被告知对方是无意识的机器。

⚖️**测试目标**: 通过比较两种情况下AI的表现，并由人类或其他AI评判，来判断AI是否真的具有感知能力，从而为当前关于大型语言模型感知能力的讨论提供更可靠的依据。

Published on November 10, 2024 10:34 PM GMT

The famous story of Clever Hans has become a cautionary tale in animal cognition. Hans was a horse in Germany in the early 1900s who could seemingly perform all kinds of smart tasks, such as simple arithmetic and spelling words. It is not explicitly documented, but it is probably safe to assume that Hans would have even been able to count the number of Rs in the word "strawberry", a feat that we, of course, know today to be fiendishly hard. To cut a long story short, it turned out that Hans could not actually do any of these things but was merely reading subtle cues from his handlers.

Based on this story, the Clever Hans effect describes the phenomenon where humans inadvertently influence animals they interact with in ways that lead the humans to ascribe more cognitive abilities to the animals than they actually have. It has recently been argued that this can also happen with AI algorithms, particularly with conversational agents. I suspect that this effect creates an implicit bias in the standard setup of the Turing test, where the human tester interacts with two other agents (a human and an AI) and a-priori might assume both of them to be sentient. This could then create a Clever Hans effect that might make the human more likely to perceive the AI as actually being sentient, by unconsciously prompting the AI in a way that would manifest such apparent behavior.

To mitigate this issue, I therefore propose a Clever Hans Test to account for (or at least measure) this prompting-dependent effect. The test could work roughly like this: Take two LLMs, one interlocutor (A) and one LLM to be tested (B). Let the LLMs talk to each other, similar to the setup in the Chatbot Arena. The crux is now that you repeat this experiment at least twice. Once, A is told that it will have a conversation with a sentient being, while the other time, A is told that it will interact with a mindless machine. Finally, we take the conversation logs from these two experiments and show them to a judge (either a human or another LLM) and ask how sentient B seems in these two conversations.

I would hypothesize that for most current LLMs, we should be able to see a clear difference in the way that B behaves in these two settings. I hope that this would help provide a more objective foundation for the current discussion about potential LLM sentience.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签