Published on April 24, 2025 12:38 AM GMT
Cognitive dissonance is the discomfort we feel when our beliefs don't line up with our actions. More generally, the two dissonant things don't need to be belief and action. It can be the discomfort we feel when different beliefs of ours are in contradiction. It can also arise when our different actions lead to conflicting goals. Often, we aren't even fully conscious of this discomfort. It may be hard to notice in ourselves, but it is easy to spot in others.
Here are some of the old, classic, examples from the empirical literature, served up and summarized courtesy of Gemini 2.5 Pro. It's unclear if these studies would survive replication attempts, but let's examine them nonetheless:
- The Classic Forced Compliance Study (Festinger & Carlsmith, 1959):
- Behavior: Participants performed extremely dull and repetitive tasks (like turning pegs on a board for an hour). Afterwards, they were asked to tell the next participant (who was actually a confederate) that the tasks were very interesting and enjoyable.Manipulation: Participants were paid either $1 or $20 (a significant amount in 1959) to lie. A control group did the tasks but didn't lie.Dissonance: Those paid only $1 experienced high dissonance. Their behavior (lying) conflicted strongly with their belief (the task was incredibly boring), and they had insufficient external justification for the lie ($1 wasn't really enough to justify misleading someone). Those paid $20 had sufficient external justification – they could tell themselves they lied for the money.Belief Change: When later asked to rate how enjoyable the tasks actually were, the participants paid $1 rated the tasks significantly more enjoyable than those paid $20 or those in the control group. To reduce the dissonance caused by lying for a paltry sum, they unconsciously changed their belief about the task itself, convincing themselves it wasn't so bad after all. The $20 group didn't need to change their belief; the money justified the lie.
- Behavior: Female college students volunteered to join a group that would discuss the psychology of sex.Manipulation: To be admitted to the group, participants had to undergo an "embarrassment test" (initiation). This initiation was either severe (reading obscene words and lurid sexual passages aloud), mild (reading less embarrassing sex-related words), or there was no initiation (control group). All participants then listened to a pre-recorded discussion by the group they supposedly joined, which was designed to be incredibly dull and banal.Dissonance: Those who underwent the severe initiation experienced high dissonance. Their behavior (undergoing an embarrassing and difficult initiation) conflicted with the reality they encountered (the group discussion was worthless and boring). "Why did I go through that ordeal for this?"Belief Change: When asked to rate the discussion and the group members, participants who had gone through the severe initiation rated the discussion and the group members significantly more favorably than those in the mild or no-initiation conditions. To justify the effort and embarrassment of their behavior (the severe initiation), they changed their belief about the group, convincing themselves it was actually quite interesting and worthwhile.
- Behavior: Participants (women shoppers) were asked to rate the desirability of several household appliances (like toasters, coffee makers, etc.). As a reward for participating, they were told they could choose one of two appliances to take home.Manipulation: Some participants were offered a choice between two items they had rated as highly and similarly desirable (high-dissonance condition – difficult choice). Others were offered a choice between one highly desirable item and one they had rated much lower (low-dissonance condition – easy choice). After making their choice, participants were asked to rate the products again.Dissonance: Those in the high-dissonance condition experienced more conflict after making their choice. They had chosen one attractive item but had to reject another almost equally attractive item. This creates dissonance: thoughts about the positive features of the rejected item and potential negative features of the chosen item conflict with the behavior of choosing.Belief Change: In the second rating, participants in the high-dissonance condition increased their rating of the item they chose and decreased their rating of the item they rejected. They effectively "spread the alternatives" in their minds to make their choice seem more obvious and justified after the fact. This reduces the dissonance by solidifying the belief that they made the right decision.
These are all pretty interesting in their own right, and at this point this essay could veer into many different directions, each of which would be very fitting for a LessWrong post. For example, all of these posts are at least tangentially related to the topic:
- RationalizationMotivated Stopping and Motivated ContinuationYou are not too "irrational" to know your preferences.Suffering as attention-allocational conflictWhere Recursive Justification Hits BottomMaking Beliefs Pay Rent (in Anticipated Experiences)
For now though, I want to keep this post short and to the point. So let me reiterate the only basic observation I want to make right now: cognitive dissonance, a mismatch between behavior and internal states, is mentally taxing. It is almost as if our brains are operating like a thermodynamic system and they are trying to minimize a free energy. Beliefs and goals which conflict with behavior require extra free energy. If you want to change how you feel about something, sometimes the most effective way to do this might be to change your behavior first.
But right now I'm not interested in the human side of the story. Mostly, what I want to ask is: does this principle apply to AI agents? The common worry with alignment is that by patching some instance of reward hacking, or by patching some observable misalignment, the AI agent will merely get better at hiding its true intent. Then, down the road, once the AI is in a position of hithero unacquired power, it can seize the moment and pursue its true goal (which doesn't end well for the humans).
If the AI agent operates like a human though, and the neural net architecture tries to find a state of least entropy, or some type of minimized free-energy, then this shouldn't be a worry. Every instance of misaligned behavior that gets patched away is probabilistically nudging the AI agent closer to "true" alignment. One objection to this analogy would be to point out that humans have been selected by natural selection to have incredibly efficient, and hence also lazy brains. AI agents are not being selected primarily for how much energy they consume.
However, this might imply that creating distilled or "mini" versions of models could make them more likely to be aligned. If you continually push models to achieve the same level of performance with less compute, any resources previously available for hidden or misaligned strategies would be stripped away.
Anyway, this is all speculative and high level, but maybe some of you will find it a useful framing.
Discuss