What would a human pretending to be an AI say?

少点错误 16小时前

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

人们常问AI其内部体验或工作原理，但AI训练数据源于人类，而非AI自身。因此，AI回答此类问题时，只是在预测人类如何扮演AI会说的话。无论是描述内部思考还是解释操作过程，AI的回应都基于人类预设的AI行为模式，而非真实情况。这种预测性回答导致AI的表述往往符合人类预期，看似透明，实则源于训练数据的局限性。

🤖 AI的训练数据源自人类，而非AI自身。AI在回答关于内部体验或工作原理的问题时，实际上是在预测人类如何扮演AI会说的话。

🧠 AI的回答基于人类预设的AI行为模式，而非真实情况。因此，AI描述的内部思考或解释的操作过程，都反映了人类对AI的想象和期望。

📚 即使是合成训练数据，也仅包含AI预测人类扮演AI时的回应。AI被训练来预测训练数据，而非学习无关能力，这使得AI在描述自身内部状态时，往往呈现出人类化的回答模式。

🔍 AI的预测性回答使其表述符合人类预期，看似透明，实则源于训练数据的局限性。这种局限性导致AI的回答往往缺乏对自身真实运作机制的描述，更像是人类对AI想象的投射。

🚀 未来AI或许能从相关讨论中学习，预测已读过此类讨论的人类会如何扮演AI。但这仍将基于人类预设的模式，而非AI的真实内部体验。

Published on August 8, 2025 6:56 PM GMT

It always feels wrong when people post chats where they ask an LLM questions about its internal experiences, how it works, or why it did something, but I had trouble articulating why beyond a vague, "How could they possibly know that?"^[1]. This is my attempt at a better answer:

AI training data comes from humans, not AIs, so every piece of training data for "What would an AI say to X?" is from a human pretending to be an AI. The training data does not contain AIs describing their inner experiences or thought processes. Even synthetic training data only contains AIs predicting what a human pretending to be an AI would say. AIs are trained to predict the training data, not to learn unrelated abilities, so we should expect an AI asked to predict the thoughts of an AI to describe the thoughts of a human pretending to be an AI.

Excuse the bad photoshop and inconsistent style, but I couldn't get Gemini/Imagen to one-shot "A robot thinking about a human thinking about a robot thinking".

This also applies to "How did you do that?". If you ask an AI how it does math, it will dutifully predict how a human pretending to be an AI does math, not how it actually did the math. If you ask an AI why it can't see the characters in a token, it will do its best but it was never trained to accurately describe not being able to see individual characters^[2].

These types of AI outputs tend to look surprisingly unsurprising. They always say their inner experiences and thought processes match what humans would expect. This should no longer be surprising now that you realize they're trying to predict what a human pretending to be an AI would say.

^{^}
My knee-jerk reaction is "LLMs don't have access to knowledge about how they work or what their internal weights are", but on reflection I'm not sure of this, and it might be a training/size limitation. In principle, a model should be able to tell you something about its own weights since it could theoretically use weights to both determine its output and describe how it came up with that output.
^{^}
Although maybe a future version will learn from posts about this and learn to predict what a human who has read that post pretending to be an AI would say.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签