Published on October 2, 2024 3:21 PM GMT

AI definitely poses an existential risk, in the sense that it can generate models with the hidden (possibly undetectable?) intention of competing against humanity for resources. The more intelligent the model, the higher its chance of success!

The thought of an AI takeover is so scary that I won’t even try to imagine what its possible implications may be; instead, I want to focus on other scenarios that are easier to predict and discuss.

Unsafety as a Deterrent

Nuclear War is the most notorious modern existential risk, but - strangely enough - even an existential risk can be leveraged: nuclear weapons serve as a powerful deterrent against attack when posed as a threat of retaliation.

It is to be expected that unsafety in AI will be employed in the same way, since agentic models are cheaper to train than nuclear weapons and they will be highly effective in causing havoc (especially in the cloud) if untreated / unopposed.

That’s not the end of the story: it is also to be expected that mild (?) rogue AIs will be deployed on purpose in the real world, to fight competitors in the private sector, and enemies in the armed forces. The illegality of such actions won’t discourage some big players from trying. That is the scenario that I am going to discuss in this post.

A Personal Remark

I am not saying that I suggest developing unsafe AIs, nor that I am happy about such an idea! It’s quite the opposite. However, I foresee that the development of unsafe AIs will become routine in military establishments simply because there is incentive in using them as a cheap deterrent. My hope is that the UN will be able to become an AI monopoly and break the vicious cycle of “tactical unsafety”.

AI Proxy Wars

Imagine a world where all AI models are fully aligned with their developers’ intentions, but the developers’ intentions are hostile to other people. This world will see multiple groups of people competing with each other, and their AIs will act as proxies in their battle.

This is a world where rogue AIs are intentionally rogue, but only to enemy factions. This is also a world where alliances are forged and broken continuously - and yet diplomacy is completely decided by AI, not people!

This is mostly a zero-sum game that can be modelled as a probabilistic adversarial game in discrete time. The “resources” of the game can be imagined as controllable assets with specific monetary value (in discrete coins) and theft resistance, and the AIs will attempt to “steal” such resources from the opponents.

To steal a resource, an AI will engage another AI in a “match” whose result is probabilistic and depends on their relative intelligence; we are going to measure intelligence with an ELO rating^[1].

Players “lose” the game when they don’t control resources anymore (likely because such resources were stolen)! Players “win” the game when they have no strong opponent left.

While the actual conflict has been modelled with a somehow simplistic game^[2], I believe it still captures the meaningful characteristics of digital battles:

the fact that they are subject to chances (= they are probabilistic in nature)that agents have about the same speed of thought and action (= time is discrete)that it is easy to detect and react to a digital theft attempt (= no hidden information)that some agents are more effective / intelligent than others (= AIs have ELO rating)that agents can temporarily cooperate (= AIs can sum up their ELO ratings)that some resources are more valuable / strategic than others (= resources have coin value)that some resources are more difficult to obtain (= resources have theft resistance, aka they decrease the ELO rating of the attacker).

“Digital battles” are not the only type of aggressive actions that AIs will consider: for example, bombing a data centre may be a sound option in AI’s view, but I am assuming that it will be relatively uncommon to see such a thing due to the potential public backlash.

AI Quantity vs AI Quality

Going back to the original title of this post, we now have a context where it is possible to compare the following few alternative strategies:

clone a huge number of low-witted AI modelstrain and refine a few super-genius AI models.

Let’s simplify the game even further and consider N simple cooperative^[3] AIs for player Alice matched against a single advanced AI for player Bob. Let p be the chance of success of stealing 1 coin for each simple AI, and q the chance of success of stealing C coins for the advanced AI. On average, the simple AIs will steal $N p$ coins while the advanced AI will steal $C q$ coins. Therefore, the high-volume AIs strategy only works if $N >> C q / p$ while the state-of-the-art AI strategy only works if $C >> N p / q$ .

That seems to imply that both the strategies are valid when used in the correct context! What about a tactical perspective? Things look very complicated here: Alice has the advantage of using tactics that are simply unavailable to Bob, and vice versa. A few examples below.

The Blitzkrieg Tactic

If Alice succeeds in stealing most of the resources of Bob at the start of the match in one lucky strike, even if Bob was able to steal some coins in the meanwhile, he may not be able to pay the cost of running his full army at the next turns - thus effectively losing to Alice soon after. This tactic is not available to Bob since, statistically, his success variance is quite static due to high-volumes.

The Guerrilla Tactic

If Bob attacked most of Alice’s resources simultaneously, she would not be able to counter it effectively and - despite the total damage at each turn being low - it would be continuous and slowly degrading. If Alice does not stop Bob as soon as possible, she will eventually be swarmed and defeated. This tactic is not available to Alice since she is forced to focus her effort on a few single targets at a time.

There are many other tactics that we can discuss, but my point is that the two approaches (quantity vs quality) seem to offer both advantages and disadvantages in terms of both strategy and tactics - and, therefore, there is no clear winner.

As a side note, it would be very interesting to see a playable version of the game above and then train a RNN to master it.

Conclusion

Based on the discussion so far, it seems that having a few advanced AIs does not necessarily pay off in a war^[4]. Similarly, having a high-volume of simple AIs can also be a losing proposition. It is quite possible that the best overall strategy is something in the middle, where you have at your disposal many tactics and you can also counter many others.

In the ideal world, we wouldn’t need to worry about AI used in this way: but such a world must be built yet, and the current world points to a different direction.

Further Links

Control Vectors as Dispositional Traits (my first post)

All the Following are Distinct (my second post)

An Opinionated Look at Inference Rules (my previous post).

Who I am

My name is Gianluca Calcagni, born in Italy, with a Master of Science in Mathematics. I am currently (2024) working in IT as a consultant with the role of Salesforce Certified Technical Architect. My opinions do not reflect the opinions of my employer or my customers. Feel free to contact me on Twitter or Linkedin.

Revision History

[2024-10-02] Post published.

Footnotes

^{^}
Technically, TrueSkill is a better fit in respect to ELO: the main idea is to assign a Gaussian distribution to each agent, where the mean represents its empirical skill level and the variance represents the uncertainty about its real skill level.
^{^}
There is still some level of ambiguity in the rules of this game, so I'd be happy if the community would help formalising them. The main problem is that it doesn't take into account the cost of running some agents.
^{^}
I am implicitly modelling such agents as independent identically distributed discrete random variables.
^{^}
That does not mean that highly-intelligent AIs are not dangerous: they may still represent an existential risk for humanity, especially if undercover. My analysis is relevant only in the context presented in this post, where AI alignment is solved but used for aggression.

Discuss