Cognitive Dissonance is Mentally Taxing

Published on April 24, 2025 12:38 AM GMT

Cognitive dissonance is the discomfort we feel when our beliefs don't line up with our actions. More generally, the two dissonant things don't need to be belief and action. It can be the discomfort we feel when different beliefs of ours are in contradiction. It can also arise when our different actions lead to conflicting goals. Often, we aren't even fully conscious of this discomfort. It may be hard to notice in ourselves, but it is easy to spot in others.

Here are some of the old, classic, examples from the empirical literature, served up and summarized courtesy of Gemini 2.5 Pro. It's unclear if these studies would survive replication attempts, but let's examine them nonetheless:

The Classic Forced Compliance Study

(Festinger & Carlsmith, 1959)

Behavior:

next

Manipulation:

Dissonance:

insufficient external justification

Belief Change:

actually

more enjoyable

Effort Justification Study

(Aronson & Mills, 1959)

Behavior:

Manipulation:

Dissonance:

severe

this

Belief Change:

severe

more favorably

Post-Decision Dissonance Study

(Brehm, 1956)

Behavior:

Manipulation:

similarly

Dissonance:

almost equally

Belief Change:

chose

rejected

after the fact

These are all pretty interesting in their own right, and at this point this essay could veer into many different directions, each of which would be very fitting for a LessWrong post. For example, all of these posts are at least tangentially related to the topic:

Rationalization

Motivated Stopping and Motivated Continuation

You are not too "irrational" to know your preferences.

Suffering as attention-allocational conflict

Where Recursive Justification Hits Bottom

Making Beliefs Pay Rent (in Anticipated Experiences)

For now though, I want to keep this post short and to the point. So let me reiterate the only basic observation I want to make right now: cognitive dissonance, a mismatch between behavior and internal states, is mentally taxing. It is almost as if our brains are operating like a thermodynamic system and they are trying to minimize a free energy. Beliefs and goals which conflict with behavior require extra free energy. If you want to change how you feel about something, sometimes the most effective way to do this might be to change your behavior first.

But right now I'm not interested in the human side of the story. Mostly, what I want to ask is: does this principle apply to AI agents? The common worry with alignment is that by patching some instance of reward hacking, or by patching some observable misalignment, the AI agent will merely get better at hiding its true intent. Then, down the road, once the AI is in a position of hithero unacquired power, it can seize the moment and pursue its true goal (which doesn't end well for the humans).

If the AI agent operates like a human though, and the neural net architecture tries to find a state of least entropy, or some type of minimized free-energy, then this shouldn't be a worry. Every instance of misaligned behavior that gets patched away is probabilistically nudging the AI agent closer to "true" alignment. One objection to this analogy would be to point out that humans have been selected by natural selection to have incredibly efficient, and hence also lazy brains. AI agents are not being selected primarily for how much energy they consume.

However, this might imply that creating distilled or "mini" versions of models could make them more likely to be aligned. If you continually push models to achieve the same level of performance with less compute, any resources previously available for hidden or misaligned strategies would be stripped away.

Anyway, this is all speculative and high level, but maybe some of you will find it a useful framing.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签