Published on June 6, 2025 10:30 PM GMT
A quick post on a probably-real inadequate equilibrium mostly inspired by trying to think through what happened to Chance the Rapper.
Potentially ironic artifact if it accrues karma.
1. The sculptor's garden
A sculptor worked in solitude for years, carving strange figures in his remote garden. Most of his statues failed: some cracked in winter, others looked wrong against the landscape. But occasionally, very rarely, one seemed to work.
The first visitors stumbled upon the garden by accident. They found themselves stopped by his angels—figures that somehow held both sorrow and joy, wings that seemed about to flitter.
Word traveled slowly. More visitors came, drawn by something they couldn't quite name.
The sculptor felt recognized for the first time. Not famous—but understood. His private work had somehow become communicable. He carved more angels, trying to understand what made these particular statues resonate.
As crowds grew, their attention shifted. They began photographing the angels from certain angles, comparing new works to old, developing favorites. They applauded. The sculptor, still believing he followed the same thread, unconsciously noted which details drew the longest contemplation, which angles prompted gasps.
Years passed. The garden became famous. Tour buses arrived with guides explaining the "important" pieces. The sculptor produced angels of increasing technical perfection, each guaranteed to produce the proper response at the proper moment. The crowds applauded more reliably. The sculptor carved more reliably. Each reinforced the other.
One morning, walking his garden alone before dawn, he saw his statues without the crowds. Without their reactions to guide him, he saw what he'd actually been making: the same angel, refined and repeated, each iteration more precisely calibrated to trigger the expected response.
He wasn't carving anymore. He was manufacturing applause in the shape of angels.
And the crowds—they weren't looking at angels anymore. They were seeing what they expected to see, applauding their own ability to recognize what they'd been trained to admire.
The garden had become a perfect mirror. Both he and his audience got trapped looking at their own reflections and calling it art.
2. The mirror trap
The Mirror Trap is a failure mode where creators and audiences fall into mutual Goodharting—each optimizing for a proxy of what they actually value, with each side's proxy reinforcing the other's drift. From my perspective, this dynamic crops up everywhere: in music, YouTube video essays, academic research, startup pitches, journalism, and yes, even on LessWrong itself.
Bidirectional Goodharting
Audiences initially reward creators for genuine value—insight, beauty, truth. But evaluation is costly. Over time, they substitute a proxy: reputation. "This sculptor made great angels before, so this new angel must be great." They stop looking closely, and their applause becomes automatic.
Creators initially pursue authentic expression—following internal vision, exploring what feels necessary. But creation is uncertain. Over time, they substitute a proxy: applause. "The audience loved this angel, so I must be on the right track." They stop trusting their internal compass. Their work becomes predictable.
Each side's Goodharting dynamic clearly reinforces the other's. The audience's reputation-based applause teaches the creator what to optimize for, and the creator's applause-optimized work confirms the audience's use of reputation as a guide. They become locked in a signal-cheapening spiral.
Basic Hand-Wavy Model:
Let be the creator's output at time , and be the audience's evaluation.
Initially, both track true value :
- ( creator maximizes genuine value) ( audience recognizes genuine value)
But maintaining true evaluation is costly, so proxies emerge:
- Audience proxy: "similarity to previously applauded work"Creator proxy: = "expected applause"
The recursive dynamic becomes:
As , the system reaches a fixed point where:
- : each new work is a safe variation: each evaluation is predetermined
Over time, both sides converge toward a fixed point: the creator keeps making slight variations of their last hit, and the audience keeps applauding what looks like past work. The original value function 𝑉 has vanished; only proxies-of-proxies remain. Creator and audience are no longer in dialogue about the thing that initiated their connection—insight, beauty, truth—but are trapped in a hall of mirrors, each reflecting the other's expectations.
3. Resisting the trap
If the Mirror Trap is a real thing that emerges from mutual proxy optimization, then breaking from it requires continuously disrupting the proxies themselves.
The irony—and potential futility—of noticing this trap is that it might recursively capture any attempt to mitigate it. (In other words: any of the following ideas for mitigating the trap can themselves be Goodharted—so try not to do this, I guess.)
Brief practical ideas for creators
- Maintain work that no audience will ever see. Not just practice or drafts, but actual valuable real work—the thread that keeps you honest. When your private work starts resembling your public work, this is a signal that you might be captured by proxies.Intentionally cultivate incompatible audiences. Show different work to groups with conflicting tastes. When you can't optimize for everyone simultaneously, you're more incentivized to optimize for something real instead.Break the pattern at peak success. The sculptor's next move after perfecting angels should have been anything but angels. Accept the temporary loss of applause as the cost of staying alive creatively.
Brief practical ideas for audiences
- Reward failed experiments as enthusiastically as successes. Audiences shape what creators optimize for by what you choose to value. Make creative risk economically and socially viable. Upvotes shouldn't be "I actively like this;" they should be "I support the creative attempt."Practice evaluating work without context. Attempt to approach familiar creators as if encountering them for the first time. What would you see if you didn't know whose work this was?Follow creators who contradict each other. If your taste becomes too predictable or coherent, you're inadvertently training creators to serve that taste. Diverse inputs prevent creators' mirror formation.
By default, most work gradually frogboils from angels into applause; from whatever someone originally felt compelled to create into whatever reliably generates a reinforcing response. The basic questions to ask oneself is simple, though answering is nontrivial—
If my audience disappeared tomorrow, what would I still feel compelled to make?
If my favorite creators vanished, what would I still be compelled to seek out?
The gap between those answers and current behavior measures how deep the mirror runs.
Discuss