Published on June 18, 2025 3:19 PM GMT
This is a cross posted from my substack. I thought it should be interesting to Less Wrong readers and might get a good conversation going.
How you play is what you win.- Ursula K. Le Guine
Former Google CEO Eric Schmidt recently co-authored the paper and policy guideline ‘Superintelligence Strategy: Expert Vision’, which argues for an AI strategy of maximal deterrence of potential adversaries, not unlike the nuclear strategy the US and the Soviet Union followed in the Cold War. In this essay, I will ask whether this Cold War mindset might turn our AI systems themselves into cold warriors - an outcome no one can want.
Vastness of Mind-Space
There are at least four kinds of intelligent systems that can play chess reasonably well:
- Large language models (LLMs) with reasoning capacities, though they don’t play a good game so far. These systems learn to play by abstracting what is a good move from the written chess record available in their training data. ‘Reasoning’ here means that the models are talking to themselves before giving a final answer, eliminating incoherences.Reinforcement learning systems that play against themselves and learn from their mistakes. These systems have no representations of states of affairs outside the chess board. Just like LLMs, reinforcement learning systems learn from an error signal using an algorithm known as gradient descent.Humans of course. It is plausible that the human brain also learns by a process of error-minimization, however, it is generally believed that the brain is not using gradient descent (though there are dissenters). The exact algorithm is still a subject of speculation.Classic chess computers, based on heuristics and typically a mini-max search algorithm. These are the least interesting for the purposes of my argument because it is unclear whether these methods can be scaled to other tasks.
The variety of ways of playing chess has an important implication: The reachable space of possible minds is probably quite large!1 Just as the chess task can be solved in a variety of different ways, there are many different ways of solving the more general task of behaving in a way we would consider intelligent. General artificial intelligence is not some monolithic and predetermined state that awaits us in the future. AI research is just beginning to explore the outskirts of mind-space.
Furthermore, the exploration of mind-space does not take the form of a continuous expansion or even of a directed search. Rather, AI research progresses a little like the proverbial drunkard looking for his car keys not where he lost it, but under the street light. The attention of researchers, ceteris paribus, goes where the money and expected short-term results are. Promising breakthroughs (promising further funding and/or breakthroughs) attract researchers to a domain.
In a recent article I discussed the history of cybernetics as a case in point. Cybernetics, in many ways a rival research project to AI, and one more reflective of the potential dangers involved in building thinking machines, has been almost driven to extinction. And that in spite of the fact that, in hindsight, we see that the cyberneticists were right in a lot more ways than the early AI pioneers. It was just that the lack of sufficient computing power at the time made it infeasible to construct demonstrations of cybernetic ideas. And this led to a drying up of research money in the long run in spite of the theoretical superiority of the paradigm. While many ideas of the cyberneticists, like artificial neural networks, have been picked up today, it seems clear to me that the mechanisms of scientific research that have led to the marginalization of cybernetics have also led to a significantly different portion of mind-space being explored. For instance, the focus on language processing was an obsession of the AI paradigm from the start, while cyberneticists had a greater focus on embodied and autonomous systems.2 An alternative history might have resulted in a revolution in robotics before the introduction of general purpose chat-bots.
The consequence is clear. The flow of research money impacts the paradigm intelligence researchers are working in, which ideas seem worth exploring, and this in turn influences which parts of mind-space will be subject to greater scrutiny. Researchers cannot will a misguided idea into working (which is why classical AI is dead), but the tasks one is trying to solve still constrains the solutions one will find. I will now discuss the maximal deterrence paradigm that is gaining ground in the AI policy space at the moment. You can probably see where I am going with this.
Mutual Assured AI Malefunction
In the aforementioned policy paper, the authors introduce a strategy for global AI policy aimed at US decision makers. The paper views the creation of superintelligent machines as a political issue. Of course, the sole possession of a superintelligent machine by a rival would be a risk no nation state would willingly accept. In this case it is clear that the relevant rival is China.
The authors call their strategy a multipolar one, meaning that it aims to create a stable, even beneficial, development of potentially superintelligent AI by creating a balance of national AI superpowers. The strategy is built on a combination of nonproliferation, i.e. keeping third state actors from even being in the superintelligence game, and deterrence. Deterrence is here spelled out in terms of Mutually Assured AI Malefunction (MAIM), an adaptation of the Cold War paradigm of mutually assured destruction.
Nuclear annihilation has been avoided thus far despite significant tensions between nuclear states in part through the deterrence principle of Mutual Assured Destruction (MAD), where any nuclear use would provoke an in-kind response. In the AI era, a parallel form of deterrence could emerge—what might be termed “Mutual Assured AI Malfunction” (MAIM)—where states’ AI projects are constrained by mutual threats of sabotage. - Hendrycks, Schmidt and Wang, Superintelligence Strategy: Expert Vision. 2025
MAIM is based on the idea that any approach of potentially dangerous AI capacities would be answered by an escalating series of attempts at deterrence, from cyber attacks to physical manipulations by saboteurs and finally kinetic strikes on data centers. In this way, you could not only keep the danger of any one power achieving unipolar superintelligence, but also that of a ‘rogue AI’ in check. It would be in the interest of all players to strongly deter their adversaries from taking steps in these directions. A potentially stable balance of power would be created.
There are numerous ways the analogy of nuclear deterrence and AI deterrence could be criticized. AI is not a comparably stable technology as nuclear weapons are. It is a moving target. Superintelligence might be very hard to achieve using today’s resources, but it might be doable on a relatively cheap server in just a decade. It is therefore unlikely that a deterrence strategy would be stable. Furthermore, all deterrence threats, except perhaps nuclear attacks on infrastructure, seem like they could potentially be mitigated by proper defense strategies in the long run. Again, we would be dealing with a short term strategy at best.
I want to focus on a different line of criticism: I want to ask how the MAIM paradigm might impact the nature of our AI creations themselves.
How you Play is what you Win
War breeds warriors. On the most fundamental level, I fear that an antagonistic environment in AI research will breed AIs that are adapted to a more warlike world. If we are really on the path to building superhuman intelligences, and I don’t see much reason to doubt that at least in the long run, then we want these intelligences to maximally ethical agents, compassionate, kind, even wise. An antagonistic environment is unlikely to be conducive to such a development.
Let me give two concrete possible examples how a deterrence environment might derail our path to benevolent AI. First, systems could pick up implicit biases based on the training data they are given. In the earlier days of LLMs, many less powerful models had the propensity to identify themselves as ChatGPT. The reason is simple: In the text they used as their training data, most answers to the question ‘Which LLM are you?’ were answered with ‘ChatGPT’, just because this was the only widely spread model out there. In a similar way, an antagonistic environment will create AI systems that are biased towards antagonistic responses. It will regard human users (potential saboteurs) with suspicion and might be more prone to thinking about the world in terms of a military repertoire of concepts.
Secondly, AI systems trained in a deterrence context might also be more resistant to being turned off. At least going back to Bostrom’s book Superintelligence, a central concern of AI safety researchers has been that the problem that an intelligent system that starts behaving in a disconcerting manner might resist deactivation. The rationale behind this is simple: Being turned on is considered a convergent goal from the machine's perspective, meaning that it is instrumental in achieving almost any other goal. Just by trying to reach its primary goals, whatever these are, resisting being turned off is almost always a reasonable path of action. Finding ways to build systems where this convergent goal is not adopted is a central concern of AI safety research.
I think it is obvious why an antagonistic environment will make this problem even harder to solve: AIs will have to factor in that it is always possible that foreign agents will, in real or in cyber space, is trying to manipulate it. And an intelligent system will of course reflect that very fact. The more a system resists being turned off the harder it becomes for a foreign actor to shut the system down, too.
These two points are merely illustrative. Generally, it seems an antagonistic paradigm skews the incentive landscape in a way no one can wish for. The trajectory of research through mind-space will be tweaked and we will inevitably wander into dangerous territory.
I will end with a plea for international cooperation. The AI community should not let its behavior be dictated by military logic. Too much is at stake to sacrifice it on the altar of an ideological, much less a nationalist, struggle against China. We should not underestimate how much the control and wise implementation of AI is in the interest of a regime that, by its very nature, must be opposed to radical technological disruptions that threaten the fabric of society. The creation of superintelligence might be the ultimate non-zero sum game. We have to play it right.
In their article, Schmidt and his co-authors describe only one cooperative strategy which they call the…
Moratorium Strategy. The voluntary moratorium strategy proposes halting AI development—either immediately or once certain hazardous capabilities, such as hacking or autonomous operation, are detected. Proponents assume that if an AI model test crosses a hazard threshold, major powers will pause their programs. Yet militaries desire precisely these hazardous capabilities, making reciprocal restraint implausible. Even with a treaty, the absence of verification mechanisms means the treaty would be toothless; each side, fearing the other’s secret work, would simply continue. Without the threat of force, treaties will be reneged, and some states will pursue an intelligence recursion. This dynamic, reminiscent of prior arms-control dilemmas, renders the voluntary moratorium more an aspiration than a viable plan. - Hendrycks, Schmidt and Wang, Superintelligence Stategy: Expert Vision. 2025
This is a gross simplified sketch of any reasonable cooperative regime. Such a regime would include the possibility to check the AI progress of other actors, and it would involve the signing of agreements with a set of rules that handle how to respond to potentially threatening developments. Of course, any such agreement would implicitly carry the threat that that any blatant disregard of agreed upon rules would lead to some kind of escalation. But as long as the agreed upon rules of the road are followed, research could happen in a much less antagonistic environment.
Much more could and should be said. As I am neither a citizen of the US, nor of China, I am probably indirectly subject to the proposed nonproliferation regime. What will this entail? And what are the political incentives driving the MAIM approach? If you want my opinion on such matters you might subscribe to The Anti-Completionist. This way you can be among the first to hear about it if I have something new to say on the topic, or on philosophy and AI more generally.
Discuss