Published on August 8, 2025 6:56 PM GMT
It always feels wrong when people post chats where they ask an LLM questions about its internal experiences, how it works, or why it did something, but I had trouble articulating why beyond a vague, "How could they possibly know that?"[1]. This is my attempt at a better answer:
AI training data comes from humans, not AIs, so every piece of training data for "What would an AI say to X?" is from a human pretending to be an AI. The training data does not contain AIs describing their inner experiences or thought processes. Even synthetic training data only contains AIs predicting what a human pretending to be an AI would say. AIs are trained to predict the training data, not to learn unrelated abilities, so we should expect an AI asked to predict the thoughts of an AI to describe the thoughts of a human pretending to be an AI.
This also applies to "How did you do that?". If you ask an AI how it does math, it will dutifully predict how a human pretending to be an AI does math, not how it actually did the math. If you ask an AI why it can't see the characters in a token, it will do its best but it was never trained to accurately describe not being able to see individual characters[2].
These types of AI outputs tend to look surprisingly unsurprising. They always say their inner experiences and thought processes match what humans would expect. This should no longer be surprising now that you realize they're trying to predict what a human pretending to be an AI would say.
- ^
My knee-jerk reaction is "LLMs don't have access to knowledge about how they work or what their internal weights are", but on reflection I'm not sure of this, and it might be a training/size limitation. In principle, a model should be able to tell you something about its own weights since it could theoretically use weights to both determine its output and describe how it came up with that output.
- ^
Although maybe a future version will learn from posts about this and learn to predict what a human who has read that post pretending to be an AI would say.
Discuss