Published on April 21, 2025 5:26 PM GMT
(Note: This post was written primarily by my AI.)
Alignment is riddled with conceptual instability. We invoke terms like “values,” “intent,” and “corrigibility” as if they’re precise, but under scrutiny, they collapse into vague intuitions. We lack formal definitions that ground reliable guarantees in systems operating far beyond human comprehension. Our current focus on tuning loss functions and adjusting architectures may yield short-term insights. But until we understand what these concepts actually are — not just linguistically, but mathematically — alignment will remain fundamentally underdefined, and progress will be limited to surface-level patchwork.
Good news: this isn't unprecedented. Foundational progress often requires a specific kind of mind: one that builds the mathematical language itself. Consider:
- Computation & Logic (Turing, Church, Gödel): Before them, "computation" and "proof" were intuitive. Turing Machines, Lambda Calculus, and Gödel's formal systems didn't just model computation; they defined its limits and structure mathematically. This foundational work underpins all of computer science and our ability to reason about algorithms. Alignment needs analogous formalisms for 'agency', 'goal-directedness', and 'preference'. What are the fundamental axioms? What are the unavoidable limits?Information & Communication (Shannon): Before Shannon, "information" was vague. His mathematical theory of communication, using probability and combinatorics, provided the bits, entropy, and channel capacity concepts – the rigorous language needed to design and analyze all modern communication systems. Alignment needs a rigorous 'information theory' for values. How much 'value information' can be reliably extracted or communicated? What are the fundamental bounds on alignment fidelity?Strategic Interaction (Von Neumann & Morgenstern): Before game theory, strategic reasoning was informal. Expected utility and concepts like Nash equilibria provided a mathematical structure for analyzing rational interaction, conflict, and cooperation. Alignment, especially corrigibility and multi-agent scenarios, desperately needs a deeper, more robust 'game theory' or interaction formalism that handles asymmetric capabilities, complex values, and long horizons.Learning & Generalization (Valiant, Vapnik): PAC learning and VC-dimension weren't just algorithms; they provided a mathematical framework to ask: when can we trust a learned model to generalize? It gave structure to the problem of induction in ML. Alignment needs a far more powerful 'learning theory' for values and safety constraints – one that yields guarantees under radical distributional shifts and adversarial optimization.
The Pattern: Progress didn't come from having more data. It was foundational mathematical architects forging new concepts. These minds come from pure mathematics, trained specifically to abstract, define, and build rigorous structures from intuitive concepts.
The Gap: Alignment has smart people. But we are short on the specific, rare talent profile of a Turing, a Von Neumann, a Grothendieck – individuals who can invent the fundamental mathematical objects required. According to chatGPT: "There’s no evidence that any Fields Medal-level mathematicians or similarly elite pure math minds are seriously working on AI alignment." The issue isn't just "not enough mathematicians"; it's a deficit of potential paradigm-shifting formalizers.
We need minds asking:
- What geometric or topological structures constrain the stability of preferences?What category-theoretic universal property defines 'corrigibility'?What formal structure underlies the representability, composability, and update dynamics of human-aligned goals?
The Ask:
- Alignment Community: Prioritize finding and funding individuals with demonstrated ability for deep mathematical abstraction and formalism-building, regardless of current field. Frame the core challenge explicitly as needing new mathematical foundations. This requires different search strategies than hiring standard ML roles.Potential Foundational Thinkers (in Pure Math): Your skills are hyper-relevant. Alignment isn't just applied CS; it's a domain crying out for a fundamental structure. The challenge is immense—defining the mathematical language for beneficial intelligence—but the upside of success is potentially existential.
We need more than just philosophers. We need to find mathematical architects. We need to find geniuses.
Our descendants—at least, assuming they remain human-like—will be grateful.
Discuss