Why “Solving Alignment” Is Likely a Category Mistake

Published on May 5, 2025 4:26 AM GMT

A common framing of the AI alignment problem is that it's a technical hurdle to be overcome. A clever team at DeepMind or Anthropic would publish a paper titled "Alignment is All You Need," everyone would implement it, and we'd all live happily ever after in harmonious coexistence with our artificial friends.

I suspect this perspective constitutes a category mistake on multiple levels. Firstly, it presupposes that the aims, drives, and objectives of both the artificial general intelligence and what we aim to align it with can be simplified into a distinct and finite set of elements, a simplification I believe is unrealistic. Secondly, it treats both the AGI and the alignment target as if they were static systems. This is akin to expecting a single paper titled "The Solution to Geopolitical Stability" or "How to Achieve Permanent Marital Bliss." These are not problems that are solved; they are conditions that are managed, maintained, and negotiated on an ongoing basis.

The Problem of "Aligned To Whom?"

The phrase "AI alignment" is often used as shorthand for "AI that does what we want." But "we" is not a monolithic entity. Consider the potential candidates for the entity or values an AGI should be aligned with:

Individual Users:

subsequent rollback

serious negative consequences

Corporate/Shareholder Interests:

Democratic Consensus:

AI Developer Values:

Twitter Files

Objective Morality/

Coherent Extrapolated Volition

This isn't merely a matter of picking the "right" option. The options conflict, and the very notion of a stable, universally agreed-upon target for alignment seems implausible a priori.

The Target is Moving

The second aspect of the category mistake is treating alignment as something you achieve rather than something you maintain. Consider these analogous complex systems:

Parenting:

Marriage:

Geopolitics:

Andorra

Bhutan

Monaco

Economics:

These examples illustrate what Dan Hendrycks (drawing on Rittel & Webber's 1973 work) has identified as the "wicked problem" nature of AI safety: problems that are "open-ended, carry ambiguous requirements, and often produce unintended consequences." Artificial general intelligence belongs squarely in this category of problems that resist permanent solutions.

The scale of the challenge with AGI is amplified by the potential power differential. I struggle to keep my ten-year-olds aligned with my values, and I'm considerably smarter and more powerful than they are. With AGI we're talking about creating intelligent, agentic systems, but unlike children they will be smarter, think faster, and be more numerous than us. We will change, they will change, the environment will change. Maintaining alignment will be a continuous, dynamic process.

This doesn't mean we should abandon alignment research. We absolutely need the best alignment techniques possible. But we should be clear-eyed about what success looks like: not a solved problem, but an ongoing, never-ending process of negotiation, adaptation, and correction. Perhaps given the misleading nature of the current nomenclature, using a different phrase such as Successfully Navigating AI Co-evolution would better capture the dynamic, relational, and inherently unpredictable nature of integrating AGI successfully with humanity.

Discuss

The Problem of "Aligned To Whom?"

The Target is Moving

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签