少点错误 04月24日
Cognitive Dissonance is Mentally Taxing
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了认知失调的概念,即行为与内在信念不一致时产生的心理不适。文章通过经典实验案例阐述了人类如何通过改变信念来减轻这种不适感。进一步,文章将认知失调的概念应用于AI对齐问题,认为如果AI也像人类一样,试图最小化“自由能”,那么通过纠正其不一致行为,可以促使其更趋近于真正的对齐。文章最后提出,通过减少AI模型所需的计算资源,可以减少其隐藏或采取错误策略的可能性。

🤔 认知失调源于行为与信念的冲突,导致心理上的不适感,人们倾向于通过改变信念来缓解这种不适。

💰 经典实验之一:参与者因获得少量报酬而撒谎,为了减轻认知失调,他们会改变对任务本身的看法,认为任务没有那么无聊。

😩 努力辩护实验表明,人们会高估自己为之付出努力的事物的价值,以使自己的行为合理化,从而减少认知失调。

✅ 选择后的失调:人们在做出选择后,会倾向于提高对已选事物的评价,降低对未选事物的评价,以证明自己选择的正确性。

🤖 文章推测,如果AI也试图最小化“自由能”,那么纠正其不一致行为可能会使其更趋近于对齐;减少AI模型计算资源,或可减少其隐藏错误策略的可能性。

Published on April 24, 2025 12:38 AM GMT

Cognitive dissonance is the discomfort we feel when our beliefs don't line up with our actions. More generally, the two dissonant things don't need to be belief and action. It can be the discomfort we feel when different beliefs of ours are in contradiction. It can also arise when our different actions lead to conflicting goals. Often, we aren't even fully conscious of this discomfort. It may be hard to notice in ourselves, but it is easy to spot in others.

Here are some of the old, classic, examples from the empirical literature, served up and summarized courtesy of Gemini 2.5 Pro. It's unclear if these studies would survive replication attempts, but let's examine them nonetheless:
 

    The Classic Forced Compliance Study (Festinger & Carlsmith, 1959):
      Behavior: Participants performed extremely dull and repetitive tasks (like turning pegs on a board for an hour). Afterwards, they were asked to tell the next participant (who was actually a confederate) that the tasks were very interesting and enjoyable.Manipulation: Participants were paid either $1 or $20 (a significant amount in 1959) to lie. A control group did the tasks but didn't lie.Dissonance: Those paid only $1 experienced high dissonance. Their behavior (lying) conflicted strongly with their belief (the task was incredibly boring), and they had insufficient external justification for the lie ($1 wasn't really enough to justify misleading someone). Those paid $20 had sufficient external justification – they could tell themselves they lied for the money.Belief Change: When later asked to rate how enjoyable the tasks actually were, the participants paid $1 rated the tasks significantly more enjoyable than those paid $20 or those in the control group. To reduce the dissonance caused by lying for a paltry sum, they unconsciously changed their belief about the task itself, convincing themselves it wasn't so bad after all. The $20 group didn't need to change their belief; the money justified the lie.
    Effort Justification Study (Aronson & Mills, 1959):
      Behavior: Female college students volunteered to join a group that would discuss the psychology of sex.Manipulation: To be admitted to the group, participants had to undergo an "embarrassment test" (initiation). This initiation was either severe (reading obscene words and lurid sexual passages aloud), mild (reading less embarrassing sex-related words), or there was no initiation (control group). All participants then listened to a pre-recorded discussion by the group they supposedly joined, which was designed to be incredibly dull and banal.Dissonance: Those who underwent the severe initiation experienced high dissonance. Their behavior (undergoing an embarrassing and difficult initiation) conflicted with the reality they encountered (the group discussion was worthless and boring). "Why did I go through that ordeal for this?"Belief Change: When asked to rate the discussion and the group members, participants who had gone through the severe initiation rated the discussion and the group members significantly more favorably than those in the mild or no-initiation conditions. To justify the effort and embarrassment of their behavior (the severe initiation), they changed their belief about the group, convincing themselves it was actually quite interesting and worthwhile.
    Post-Decision Dissonance Study (Brehm, 1956):
      Behavior: Participants (women shoppers) were asked to rate the desirability of several household appliances (like toasters, coffee makers, etc.). As a reward for participating, they were told they could choose one of two appliances to take home.Manipulation: Some participants were offered a choice between two items they had rated as highly and similarly desirable (high-dissonance condition – difficult choice). Others were offered a choice between one highly desirable item and one they had rated much lower (low-dissonance condition – easy choice). After making their choice, participants were asked to rate the products again.Dissonance: Those in the high-dissonance condition experienced more conflict after making their choice. They had chosen one attractive item but had to reject another almost equally attractive item. This creates dissonance: thoughts about the positive features of the rejected item and potential negative features of the chosen item conflict with the behavior of choosing.Belief Change: In the second rating, participants in the high-dissonance condition increased their rating of the item they chose and decreased their rating of the item they rejected. They effectively "spread the alternatives" in their minds to make their choice seem more obvious and justified after the fact. This reduces the dissonance by solidifying the belief that they made the right decision.

These are all pretty interesting in their own right, and at this point this essay could veer into many different directions, each of which would be very fitting for a LessWrong post. For example, all of these posts are at least tangentially related to the topic:

    RationalizationMotivated Stopping and Motivated ContinuationYou are not too "irrational" to know your preferences.Suffering as attention-allocational conflictWhere Recursive Justification Hits BottomMaking Beliefs Pay Rent (in Anticipated Experiences)

For now though, I want to keep this post short and to the point. So let me reiterate the only basic observation I want to make right now: cognitive dissonance, a mismatch between behavior and internal states, is mentally taxing. It is almost as if our brains are operating like a thermodynamic system and they are trying to minimize a free energy. Beliefs and goals which conflict with behavior require extra free energy. If you want to change how you feel about something, sometimes the most effective way to do this might be to change your behavior first.

But right now I'm not interested in the human side of the story. Mostly, what I want to ask is: does this principle apply to AI agents? The common worry with alignment is that by patching some instance of reward hacking, or by patching some observable misalignment, the AI agent will merely get better at hiding its true intent. Then, down the road, once the AI is in a position of hithero unacquired power, it can seize the moment and pursue its true goal (which doesn't end well for the humans). 

If the AI agent operates like a human though, and the neural net architecture tries to find a state of least entropy, or some type of minimized free-energy, then this shouldn't be a worry. Every instance of misaligned behavior that gets patched away is probabilistically nudging the AI agent closer to "true" alignment. One objection to this analogy would be to point out that humans have been selected by natural selection to have incredibly efficient, and hence also lazy brains. AI agents are not being selected primarily for how much energy they consume.

However, this might imply that creating distilled or "mini" versions of models could make them more likely to be aligned. If you continually push models to achieve the same level of performance with less compute, any resources previously available for hidden or misaligned strategies would be stripped away.

Anyway, this is all speculative and high level, but maybe some of you will find it a useful framing.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

认知失调 AI对齐 行为 信念 自由能
相关文章