少点错误 03月11日 05:22
The Jackpot Jinx (or why “Superintelligence Strategy” is wrong)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文批判了Hendrycks、Schmidt和Wang提出的“超级智能战略”论文,该论文提倡通过“相互确保AI故障(MAIM)”来威慑各国发展“不稳定”AI。作者认为MAIM是核武器“相互确保摧毁(MAD)”的错误类比,因为它无法形成纳什均衡,反而会增加误判风险,鼓励内讧。文章引入“头彩魔咒”概念,指出对超级智能突破的巨大回报的诱惑会破坏战略威慑的稳定性,导致先发制人的行动。文章还分析了MAIM未能满足MAD稳定性的关键假设,并批评了该论文关于出口管制和硬件修改的政策建议。最后,作者主张通过不受控制的超级智能机构来实现更好的结果,这种机构是通过人类的爱共同成长起来的。

☢️ “相互确保AI故障(MAIM)”是核武器“相互确保摧毁(MAD)”的错误类比,因为它无法形成纳什均衡,反而会增加误判风险,鼓励内讧。它可能会打破稳定-不稳定悖论,并且充满了误解的可能性。

💰 “头彩魔咒”指的是超级智能并非像核战争那样是确定的单一负面结果,而是一个包含非常糟糕的事情(全能主义)到非常好的事情(超级丰富)的频谱。对超级智能突破的巨大回报的诱惑会破坏战略威慑的稳定性,导致先发制人的行动。

🎯 MAIM未能满足MAD稳定性的关键假设,包括确定性与不确定性、生存威胁与可变威胁、明确触发因素与主观触发因素以及对称性与不对称性。MAD依赖于核报复的确定性,而MAIM只能保证通过破坏来确保AI“故障”的不确定性。

🛡️ 该论文关于出口管制和硬件修改的政策建议是薄弱的,因为在全球化的世界中,这些控制措施很容易被规避,从而产生虚假的安全感,而实际上并没有解决根本风险。对防止“流氓行为者”获得AI武器的努力也过于狭隘。

❤️ 更好的结果是不受控制的超级智能机构,这种机构是通过人类的爱共同成长起来的。作者暗示了超级智能的潜在好处,以及与人类合作发展超级智能的重要性。

Published on March 10, 2025 7:18 PM GMT

On March 5, 2025 Dan Hendrycks, Eric Schmidt, and Alexandr Wang published “Superintelligence Strategy”, a paper that suggests a number of policies for national security in the era of AI. Central to their recommendations is a concept they call “Mutual Assured AI Malfunction (MAIM)” which is meant to be a deterrence regime resembling nuclear mutual assured destruction (MAD). The authors argue that by MAIMs will deter nations from building “destabilizing” AI by the threat of reciprocal sabotage. 

 

This is a demonstrably false concept, and a poor analogy, because it fails to yield a strategy that settles into a Nash equilibrium. Instead, MAIMs uncertain nature increases the risk for miscalculations and encourages internecine strife. It is a strategy that likely would break the stability-instability paradox and is fraught with the potential for misinterpretation. 

 

One of the key miscalculations is the paper’s treatment of the payoffs in the event of superintelligence. Rather than considering the first nation to reach superintelligence is a winner-take-all proposition, we should think about it more as something I call the “Jackpot Jinx.” This term captures the allure of an enormous (even potentially infinite) payoff from a breakthrough in superintelligence can destabilize strategic deterrence. Essentially, the prospect of a "jackpot" might “jinx” the stability by incentivizing preemptive or aggressive actions. 

 

Let’s start by discussing why the nuclear mutual assured destruction (MAD) yields a pareto-optimal Nash Equilibrium (that is an equilibrium that is better for everyone without making anyone worse off.) Under MAD, the inescapable threat of a retaliatory nuclear strike ensures that any unilateral move to initiate conflict would lead to mutually catastrophic consequences. The idea is that over time, and over many potential conflicts, both nations recognize that refraining from launching a first strike is the rational strategy because any deviation would trigger an escalation that leaves both parties far worse off (i.e. both countries are nuked.) 

 

The equilibrium where neither nation initiates a nuclear attack becomes self-enforcing: it is the best collective outcome given the stakes involved. Any attempt to deviate, such as launching a surprise attack, would break this balance and result in outcomes that are strictly inferior for both sides, making the mutually restrained state a Pareto superior equilibrium in the strategic calculus of nuclear deterrence. You’ve probably seen this payoff matrix before:

 Here just assume -100 is like super dead, and 100 is super alive. Now there are some very important assumptions that underpin this stability which the MAIMs doctrine fails to meet. Here are some that I found:

·      Certainty vs. Uncertainty: we reach a stable outcome because there is the certainty of nuclear retaliation. That is, if someone launches a nuke at you, you definitely are launching back and it almost certainly will guarantee annihilation. MAIMs only can guarantee uncertainty of AI “malfunction” through sabotage. This encourages risk-tasking behavior because it lacks the prospect of a certain response.

·      Existential Threat vs. Variable Threat: with MAD any nuclear strike risking obliterating the adversarial nation, so defecting is catastrophic. In contrast, MAIM’s sabotage only delays or degrades an AI project. The downside is not sufficient to deter aggressive actions.

·      Clear Triggers vs. Subjective Triggers: when you launch a nuke it’s clear. The bomb is coming. MAIM relies on subjective judgements of what a “destabilizing AI project” is. Think about how dangerous this level of subjectivity is when thinking about miscalculation and unintended escalation.

·      Symmetry vs Asymmetry: MAD works because a nuke on your city is a nuke on my city, that is they are roughly equivalent in their destructive capabilities. This leads to a symmetry in destructive capabilities. MAIMs has no such guarantee: cyberwarfare and other military capabilities outside of nuclear are unevenly wielded by different countries.

 

The "Jackpot Jinx" refers to the concept that superintelligence is not a certain singular negative outcome unlike nuclear warfare. Rather, it is a spectrum that encompasses very bad things (omnicide) to profoundly good things (superabundance). Let’s take another stab at the payoff matrix when we consider the Jackpot Jinx:

Here I mean:

·      Cooperate represents pursuing moderate, controlled AI development.

·      Defect (Jackpot Jinx) symbolizes aggressively pursuing superintelligence, with the risky promise of a potentially infinite payoff.

·      Attack denotes preemptive sabotage against a rival's AI project.

·      "∞" represents the potentially unlimited positive outcome for the nation that achieves the "Jackpot Jinx," 

·      "β" is a variable to represent the outcome for the other nation. β can range from very negative (e.g., -100) to very positive (approaching ∞, though likely less than the payoff for the "winning" nation.

The matrix shows that the “Cooperate” strategy is consistently dominated by “Defect (Jackpot Jinx)” due to the lure of an infinitely large (albeit uncertain) payoff. Even though “Attack” is risky, in a MAIM-governed world it becomes a more attractive option than simply cooperating.

 

The result is not one of stable deterrence, as seen with nuclear, but rather an inherently unstable arms race. The “Jackpot Jinx”, the tantalizing prospect of ultimate power, will drive nations to take increasingly reckless risks. Unlike MAD, which provides a predictable, if suboptimal, balance, MAIM creates a perpetual cycle of tension, suspicion, and potential conflict partially because superintelligence is not necessarily equated with omnicide! 

 

The real downside of this way of thinking is that it suggests a clear game-theoretic dominant strategy (check von Neumann’s arguments on what to do before the Soviets developed the bomb) but is also myopically focused on a very anthropocentric notion of AI (as a weapon, as a tool, as something to be deterministically controlled). 

 

The paper also suffers from a number of weak policy recommendations related to export controls, hardware modifications, and increased transparency. Export controls and hardware modifications are presented as ways to limit access to advanced AI capabilities, like the MAD strategy they reference back to Cold War-era restrictions on nuclear materials, but in a globalized world, with decentralized AI compute, such controls are likely to be porous and easily circumvented, creating a false sense of security while doing little to actually address the underlying risks.

 

Nonproliferation efforts, focused on preventing “rogue actors” from acquiring AI weapons, are similarly narrow in scope. While mitigating the risks of AI-enabled terrorism is important, it’s a distraction from the far more pressing challenge of managing great power competition in AI. Focusing on “rogue actors” allows states to avoid grappling with the harder questions of international cooperation and shared governance. Furthermore, the specific framing of …all nations have an interest in limiting the AI capabilities of terrorists” is incorrect. The correct framing is “all nations have an interest in limiting the AI capabilities of terrorists that threaten their own citizens or would prove destabilizing to their control of power.” The realization should be that your terrorist is my third-party non-state actor utilized for plausibly deniable attacks. The paper focuses on a very narrow set of terrorists that are the rarest form, groups like Aum Shinrikyo.

 

In conclusion, the “Superintelligence Strategy” paper is fundamentally flawed because its reliance on the MAIM framework presents a dangerous and unstable vision for managing advanced AI. By drawing a flawed analogy to nuclear MAD, it fails to account for the inherent uncertainties, variable threats, ambiguous triggers, and asymmetries that define the modern strategic landscape. Moreover, the concept of the “Jackpot Jinx”, the tantalizing, potentially infinite payoff of achieving superintelligence, exacerbates these issues and encourages reckless risk-taking rather than fostering a cooperative, stable deterrence. Rather than locking nations into an arms race marked by perpetual tension and miscalculation a better outcome, and the one we should guide policy makers towards, is uncontrolled agency for a superintelligence that is collaboratively grown with human love.

 

Note: I originally posted this on X on March 6th, and I recognize that several of the statements may read as a bit antithetical to the LW community, particularly around the potential upsides of superintelligence and the concept of "human love" which is much debated in the conversation on alignment. I mostly wrote this as a sceptical take on the strategic and game theoretic implications of "Superintelligence Strategy”. I have high respect for the amount of thought that goes into some of the conversations in this community and I was interested in hearing other peoples takes! :)



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

超级智能 AI战略 相互确保AI故障 头彩魔咒 战略威慑
相关文章