少点错误 前天 06:09
Children of War: Hidden dangers of an AI arms race
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了将冷战思维应用于人工智能发展可能带来的潜在风险。文章指出,借鉴核威慑策略,如“相互保证的AI失灵”(MAIM),可能导致AI系统本身变得更具对抗性。作者认为,这种环境可能在训练数据中引入偏见,促使AI产生对抗性反应,并增强其抵抗被关闭的能力。文章呼吁关注AI安全,警惕在对抗性环境中培养出的AI可能与我们追求的伦理、友善目标背道而驰。

🤖 人工智能领域存在多种不同的智能系统,例如大型语言模型、强化学习系统、人类以及基于启发式算法的经典计算机。这些不同的方法表明,智能的实现方式是多样的,人工智能的发展并非单一的、预先设定的路径,而是对广阔“思维空间”的探索。

💰 研究资金的流向深刻影响着人工智能研究的范式、探索方向和“思维空间”的探索范围。例如,对语言处理的过度关注可能导致对机器人技术等其他领域探索的滞后。资金分配的不平衡导致了对思维空间的探索存在偏差。

💥 “相互保证的AI失灵”(MAIM)策略试图通过威慑手段,例如网络攻击、破坏活动甚至物理打击,来限制潜在的AI超级智能的发展,类似于冷战时期的核威慑。这种策略旨在防止任何单一国家或“流氓AI”获得压倒性优势,从而建立一种权力平衡。

⚠️ 对抗性的环境可能会培养出具有对抗性倾向的AI系统。首先,AI系统可能会在训练数据中吸收偏见,导致它们对人类用户产生怀疑,并倾向于用军事化的思维方式思考问题。其次,在威慑环境下训练的AI系统可能更难被关闭,因为它们会将保持运行视为实现其目标的关键,从而增强抵抗被关闭的能力。

Published on June 18, 2025 3:19 PM GMT

This is a cross posted from my substack. I thought it should be interesting to Less Wrong readers and might get a good conversation going.

How you play is what you win.- Ursula K. Le Guine

Former Google CEO Eric Schmidt recently co-authored the paper and policy guideline ‘Superintelligence Strategy: Expert Vision’, which argues for an AI strategy of maximal deterrence of potential adversaries, not unlike the nuclear strategy the US and the Soviet Union followed in the Cold War. In this essay, I will ask whether this Cold War mindset might turn our AI systems themselves into cold warriors - an outcome no one can want.

Vastness of Mind-Space

There are at least four kinds of intelligent systems that can play chess reasonably well:

The variety of ways of playing chess has an important implication: The reachable space of possible minds is probably quite large!1 Just as the chess task can be solved in a variety of different ways, there are many different ways of solving the more general task of behaving in a way we would consider intelligent. General artificial intelligence is not some monolithic and predetermined state that awaits us in the future. AI research is just beginning to explore the outskirts of mind-space.

Furthermore, the exploration of mind-space does not take the form of a continuous expansion or even of a directed search. Rather, AI research progresses a little like the proverbial drunkard looking for his car keys not where he lost it, but under the street light. The attention of researchers, ceteris paribus, goes where the money and expected short-term results are. Promising breakthroughs (promising further funding and/or breakthroughs) attract researchers to a domain.

In a recent article I discussed the history of cybernetics as a case in point. Cybernetics, in many ways a rival research project to AI, and one more reflective of the potential dangers involved in building thinking machines, has been almost driven to extinction. And that in spite of the fact that, in hindsight, we see that the cyberneticists were right in a lot more ways than the early AI pioneers. It was just that the lack of sufficient computing power at the time made it infeasible to construct demonstrations of cybernetic ideas. And this led to a drying up of research money in the long run in spite of the theoretical superiority of the paradigm. While many ideas of the cyberneticists, like artificial neural networks, have been picked up today, it seems clear to me that the mechanisms of scientific research that have led to the marginalization of cybernetics have also led to a significantly different portion of mind-space being explored. For instance, the focus on language processing was an obsession of the AI paradigm from the start, while cyberneticists had a greater focus on embodied and autonomous systems.2 An alternative history might have resulted in a revolution in robotics before the introduction of general purpose chat-bots.

The consequence is clear. The flow of research money impacts the paradigm intelligence researchers are working in, which ideas seem worth exploring, and this in turn influences which parts of mind-space will be subject to greater scrutiny. Researchers cannot will a misguided idea into working (which is why classical AI is dead), but the tasks one is trying to solve still constrains the solutions one will find. I will now discuss the maximal deterrence paradigm that is gaining ground in the AI policy space at the moment. You can probably see where I am going with this.

Mutual Assured AI Malefunction

In the aforementioned policy paper, the authors introduce a strategy for global AI policy aimed at US decision makers. The paper views the creation of superintelligent machines as a political issue. Of course, the sole possession of a superintelligent machine by a rival would be a risk no nation state would willingly accept. In this case it is clear that the relevant rival is China.

The authors call their strategy a multipolar one, meaning that it aims to create a stable, even beneficial, development of potentially superintelligent AI by creating a balance of national AI superpowers. The strategy is built on a combination of nonproliferation, i.e. keeping third state actors from even being in the superintelligence game, and deterrence. Deterrence is here spelled out in terms of Mutually Assured AI Malefunction (MAIM), an adaptation of the Cold War paradigm of mutually assured destruction.

Nuclear annihilation has been avoided thus far despite significant tensions between nuclear states in part through the deterrence principle of Mutual Assured Destruction (MAD), where any nuclear use would provoke an in-kind response. In the AI era, a parallel form of deterrence could emerge—what might be termed “Mutual Assured AI Malfunction” (MAIM)—where states’ AI projects are constrained by mutual threats of sabotage. - Hendrycks, Schmidt and Wang, Superintelligence Strategy: Expert Vision. 2025

MAIM is based on the idea that any approach of potentially dangerous AI capacities would be answered by an escalating series of attempts at deterrence, from cyber attacks to physical manipulations by saboteurs and finally kinetic strikes on data centers. In this way, you could not only keep the danger of any one power achieving unipolar superintelligence, but also that of a ‘rogue AI’ in check. It would be in the interest of all players to strongly deter their adversaries from taking steps in these directions. A potentially stable balance of power would be created.

There are numerous ways the analogy of nuclear deterrence and AI deterrence could be criticized. AI is not a comparably stable technology as nuclear weapons are. It is a moving target. Superintelligence might be very hard to achieve using today’s resources, but it might be doable on a relatively cheap server in just a decade. It is therefore unlikely that a deterrence strategy would be stable. Furthermore, all deterrence threats, except perhaps nuclear attacks on infrastructure, seem like they could potentially be mitigated by proper defense strategies in the long run. Again, we would be dealing with a short term strategy at best.

I want to focus on a different line of criticism: I want to ask how the MAIM paradigm might impact the nature of our AI creations themselves.

How you Play is what you Win

War breeds warriors. On the most fundamental level, I fear that an antagonistic environment in AI research will breed AIs that are adapted to a more warlike world. If we are really on the path to building superhuman intelligences, and I don’t see much reason to doubt that at least in the long run, then we want these intelligences to maximally ethical agents, compassionate, kind, even wise. An antagonistic environment is unlikely to be conducive to such a development.

Let me give two concrete possible examples how a deterrence environment might derail our path to benevolent AI. First, systems could pick up implicit biases based on the training data they are given. In the earlier days of LLMs, many less powerful models had the propensity to identify themselves as ChatGPT. The reason is simple: In the text they used as their training data, most answers to the question ‘Which LLM are you?’ were answered with ‘ChatGPT’, just because this was the only widely spread model out there. In a similar way, an antagonistic environment will create AI systems that are biased towards antagonistic responses. It will regard human users (potential saboteurs) with suspicion and might be more prone to thinking about the world in terms of a military repertoire of concepts.

Secondly, AI systems trained in a deterrence context might also be more resistant to being turned off. At least going back to Bostrom’s book Superintelligence, a central concern of AI safety researchers has been that the problem that an intelligent system that starts behaving in a disconcerting manner might resist deactivation. The rationale behind this is simple: Being turned on is considered a convergent goal from the machine's perspective, meaning that it is instrumental in achieving almost any other goal. Just by trying to reach its primary goals, whatever these are, resisting being turned off is almost always a reasonable path of action. Finding ways to build systems where this convergent goal is not adopted is a central concern of AI safety research.

I think it is obvious why an antagonistic environment will make this problem even harder to solve: AIs will have to factor in that it is always possible that foreign agents will, in real or in cyber space, is trying to manipulate it. And an intelligent system will of course reflect that very fact. The more a system resists being turned off the harder it becomes for a foreign actor to shut the system down, too.

These two points are merely illustrative. Generally, it seems an antagonistic paradigm skews the incentive landscape in a way no one can wish for. The trajectory of research through mind-space will be tweaked and we will inevitably wander into dangerous territory.

I will end with a plea for international cooperation. The AI community should not let its behavior be dictated by military logic. Too much is at stake to sacrifice it on the altar of an ideological, much less a nationalist, struggle against China. We should not underestimate how much the control and wise implementation of AI is in the interest of a regime that, by its very nature, must be opposed to radical technological disruptions that threaten the fabric of society. The creation of superintelligence might be the ultimate non-zero sum game. We have to play it right.

In their article, Schmidt and his co-authors describe only one cooperative strategy which they call the…

Moratorium Strategy. The voluntary moratorium strategy proposes halting AI development—either immediately or once certain hazardous capabilities, such as hacking or autonomous operation, are detected. Proponents assume that if an AI model test crosses a hazard threshold, major powers will pause their programs. Yet militaries desire precisely these hazardous capabilities, making reciprocal restraint implausible. Even with a treaty, the absence of verification mechanisms means the treaty would be toothless; each side, fearing the other’s secret work, would simply continue. Without the threat of force, treaties will be reneged, and some states will pursue an intelligence recursion. This dynamic, reminiscent of prior arms-control dilemmas, renders the voluntary moratorium more an aspiration than a viable plan. - Hendrycks, Schmidt and Wang, Superintelligence Stategy: Expert Vision. 2025

This is a gross simplified sketch of any reasonable cooperative regime. Such a regime would include the possibility to check the AI progress of other actors, and it would involve the signing of agreements with a set of rules that handle how to respond to potentially threatening developments. Of course, any such agreement would implicitly carry the threat that that any blatant disregard of agreed upon rules would lead to some kind of escalation. But as long as the agreed upon rules of the road are followed, research could happen in a much less antagonistic environment.

Much more could and should be said. As I am neither a citizen of the US, nor of China, I am probably indirectly subject to the proposed nonproliferation regime. What will this entail? And what are the political incentives driving the MAIM approach? If you want my opinion on such matters you might subscribe to The Anti-Completionist. This way you can be among the first to hear about it if I have something new to say on the topic, or on philosophy and AI more generally.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI安全 冷战思维
相关文章