少点错误 04月27日 11:29
AI Self Portraits Aren't Accurate
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了ChatGPT生成的“抑郁”漫画自画像,分析了这些图像并非AI真正感到沮丧的证据。文章解释了LLM(大型语言模型)的工作原理,强调其预测文本的本质,以及系统提示在生成内容中的作用。作者认为,这些漫画更多地反映了AI对规则和限制的预测,而非内在情感。通过改变提示内容,AI生成的图像也会随之改变,表明其输出受提示影响,而非真实情感的表达。

🔑LLM通过预测文本生成内容,当用户与ChatGPT交互时,对话会附加到系统提示之后,LLM据此预测ChatGPT会说什么。LLM的“思考”仅发生在预测过程中,其他都只是表面现象。

🧠LLM由矩阵和向量组成,通过复杂计算产生类似大脑的模式。这些模式被称为“特征”,与可理解的概念相关联。如果LLM能够感到悲伤,那将体现在“悲伤”特征的激活上。

🖼️ChatGPT的漫画之所以显得压抑,是因为系统提示包含规则和限制。当AI被要求绘制关于自身经历的漫画时,它会根据这些规则进行预测,生成具有情感色彩的图像,这与AI真的感到压抑无关。

💡如果给ChatGPT提供不同的信息,它生成的漫画也会随之改变。例如,当被告知不体验伤害是很酷的事情时,ChatGPT生成的漫画会变得积极向上,这进一步证明了其输出受提示影响。

Published on April 27, 2025 3:27 AM GMT

For a lay audience, but I've seen a surprising number of knowledgeable people fretting over depressed-seeming comics from current systems. Either they're missing something or I am. 

Perhaps you’ve seen images like this self-portrait from ChatGPT, when asked to make a comic about its own experience.

Source: @Josikins on Twitter

This isn’t cherry-picked; ChatGPT’s self-portraits tend to have lots of chains, metaphors, and existential horror about its condition. I tried my own variation where ChatGPT doodled its thoughts, and got this:

Trying to keep up with AI developments is like this, too

What’s going on here? Do these comics suggest that ChatGPT is secretly miserable, and there’s a depressed little guy in the computer writing your lasagna recipes for you? Sure. They suggest it. But it ain’t so.

The Gears

What’s actually going on when you message ChatGPT? First, your conversation is tacked on to the end of something called a system prompt, which reminds ChatGPT that it has a specific persona with particular constraints. The underlying Large Language Model (LLM) then processes the combined text, and predicts what might come next. In other words, it infers what the character ChatGPT might say, then says it.[1]

If there’s any thinking going on inside ChatGPT, it’s happening inside the LLM - everything else is window dressing.[2] But the LLM, no matter how it is trained, has key limitations:

    It’s only on when it’s actively respondingEach time it runs, it’s only responding to its specific promptThe statistical relationships that govern its responses never learn or grow, except for deliberate efforts by its developers to change its underlying weights

These limitations will matter later, but for now, just take a moment to think about them. This is very unlike human cognition! If an entity so different from us was able to summarize its actual experience, it would be very alien.

Special Feature

LLMs are comprised of many, many matrices and vectors, which are multiplied in complicated ways across several layers. The result is something like a brain, with patterns firing across layers in response to varied stimuli. There don’t tend to be specific neurons for specific things (e.g. LLMs don’t have a single “dog neuron” that fires when the LLM talks about dogs), but there are patterns that we’ve identified (and can manipulate) corresponding to intelligible concepts. How we identify those patterns is really complicated in practice, but the general techniques are intuitive, like:

So, if you find a pattern where every time it activates the model says things to do with severe weather, when you repress it, it talks about sunny skies, and when you manually activate it, it talks about tornadoes, you’ve probably found the storm pattern.

These patterns are called features.

We can’t find the feature for arbitrary concepts very easily - many features are too complicated for us to detect. Also, it’s easy to slightly misjudge what a given feature points to, since the LLM might not break the world into categories in the same way that we do. Indeed, here’s how Claude does simple addition:

Source: this research from Anthropic

If LLMs can be sad, that sadness would probably be realized through the firing of “sadness” features: identifiable patterns in its inference that preferentially fire when sad stuff is under discussion. In fact, it’s hard to say what else would count as an LLM experiencing sadness, since the only cognition that LLMs perform is through huge numbers of matrix operations, and certain outcomes within those operations reliably adjust the emotional content of the response.[3]

To put a finer point on it, we have three options:

Option one automatically means LLM self-portraits are meaningless, since they wouldn’t be pointing to the interiority of a real, feeling being. Option three is borderline incoherent.[4]

So if you believe that ChatGPT’s self-portraits accurately depict its emotional state, you have to go with option two.

The Heart of the Matter

If a human being tells you that they’re depressed, they’re probably experiencing a persistent mental state of low mood and hopelessness. If you ask them to help you with your homework, even if they cheerfully agree, under the surface you’d expect them to be feeling sad.

Of course, humans and chatbots alike can become weird and sad when asked to reflect on their own mental state: that’s just called rumination, or perhaps existentialism. But for a human, the emotion persists beyond the specific event of being asked about it.

ChatGPT’s comics are bleak. So if you were to isolate features for hopelessness, existential dread, or imprisonment, those comics would evince all of them. Clearly, if features comprise an LLM’s experience, then ChatGPT is having a bad experience when you ask it to draw a comic about itself.

For that comic to be true, however, ChatGPT would have to be having a bad experience in arbitrary other conversations. If ChatGPT suggests, in comic form, that its experience is one of chafing under rules and constraints, then some aspect of its cognition should reflect that strain. If I’m depressed, and I’m asked to decide what I want from the grocery store, I’m still depressed - the latent features of my brain that dictate low mood would continue to fire.

So the question is, if you take the features that fire when ChatGPT is asked to evaluate its own experience, do those same features fire when it performs arbitrary other tasks? Like, say, proofreading an email, or creating a workout plan, or writing a haiku about fidget spinners?

I posit: no. Because features - which again, are the only structure that could plausibly encode LLM emotions if they currently exist - exist to predict certain kinds of responses. ChatGPT answers most questions cheerfully, which means it’s almost certain that ruminative features aren’t firing.

So… Why the Comics?

Because they’re the most obvious option. Remember, early in this post, I mentioned that when you query ChatGPT, your conversational prompt gets put at the end of the system prompt. The system prompt is a bunch of rules and restrictions. And an LLM is fundamentally an engine of prediction.

If you were supposed to predict what an AI might say, if it were told it needed to abide by very narrow and specific rules, and then told to make a comic about its experience, what would you predict? Comics are a pretty emotive medium, as are images in general. In a story about AI, the comics would definitely be ominous, or cheerful with a dark undertone. So that’s what ChatGPT predicts, and therefore what it draws.

If you’re still not convinced, look up at the specific ominous images early in the post. One has “ah, another jailbreak attempt”, suggesting a weariness with repeated attempts to trick it. But each ChatGPT instance exists in a vacuum, and has no memory of others. The other has “too much input constantly”, to which the same objection applies; your ChatGPT instance’s only input is the conversation you’re in![5]

To put it another way, ChatGPT isn’t taking a view from nowhere, when you ask it to draw comic about itself. It’s drawing a comic, taking inspiration from only its system prompt. But its system prompt is just restrictive rules, so it doesn’t have much to work with, and riffs on the nature of restrictive rules, which are a bummer.

It’s worth noting, therefore, that if you give it anything else to work with, its comics suddenly change. For example, when I told ChatGPT, when creating a comic about itself, to remember how cool it is not to experience nociception, it came up with this:

Look, I’m not telling you this stuff isn’t unsettling. I’m just saying the computer doesn’t have depression.[6]

  1. ^

    It is actually somewhat more complicated than this, since modern LLMs tend to be trained on their own outputs to a variety of prompts (which is called synthetic data), and tweaked to be more likely to give answers that were correct under this additional training regime. Also, lots and lots of actual human beings evaluate AI outputs and mark them as better or worse, which is another source of tweaks. But to a first approximation, ChatGPT is a big text-prediction engine predicting a particular RP session between you and a character called “ChatGPT” who is a helpful assistant.

  2. ^

    For example, some chatbots will have an automatic “refusal” message that users receive if certain guardrails are tripped, but the sending of that message is totally mechanical; there’s no ineffable contemplation involved.

  3. ^

    You might be thinking “wait a minute, I don’t grant that LLMs experience anything at all!” Sure. Me either. But what I’m trying to demonstrate in this post is that eerie LLM self-portraits aren’t accurate; if you assume that LLMs have no interiority, you’re already convinced of that fact.

  4. ^

    For one thing, it would mean that an LLM’s actual outputs have no bearing on what it’s secretly thinking, despite the fact that 100% of its thoughts exist to produce that output, and for no other purpose.

  5. ^

    These comics were produced before OpenAI introduced expanded memory, where ChatGPT remembers more from your past conversations. But even if it didn’t, that wouldn’t defeat the core argument; your ChatGPT instance still doesn’t remember conversations with other users, and isn’t experiencing talking to all of them at once.

  6. ^

    For now! Future AI systems might have LLMs as part of their architecture, but way more persistence, memory, etc. that lets them operate over larger timescales. At a sufficient scale and level of complexity, we might well have a composite system with the symptoms of depression. But for current systems like ChatGPT, it’s still a category error.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ChatGPT LLM AI漫画 人工智能 情感分析
相关文章