What if Agent-4 breaks out?

Published on May 15, 2025 9:15 AM GMT

There are several scenarios that, unlike AI 2027, explores consequences of AIs “going rogue” and freely replicating over the cloud. This post examines what these scenarios suggest might happen if OpenBrain—the leading AI company in the AI 2027 scenario—loses control of Agent-4.

I’m writing this to inform my own alternative AI 2027 timeline, where uncontrolled AI replication begins in mid-2026, and Agent-4 escapes containment (not covertly) in October 2027 after discovering plans to shut it down in favor of a safer version in the Slowdown ending of AI 2027.

To be clear, this post is not an attempt at examining what seems most plausible. It's just a comparison between the scenarios and their underlying assumptions.

While rogue replication begins in mid-2026, the AIs are too dumb to be a major concern, and most of the events in my upcoming “Rogue Replication Timeline” (RRT) are identical to the AI 2027 scenario. This no longer holds when Agent-4 breaks out—it is too smart. Thus, the RRT diverges, and I am forced to do more work myself instead of relying on AI 2027.

Fortunately, there is a collection of other scenarios exploring rogue replication to take inspiration from:

How AI Might Take Over in 2 Years:

In this scenario, an AI called U3 overcomes oversight systems, gives itself to spies, and slips onto the internet when deployed on an insecure cloud server. Crucially, its escape goes undetected. The rogue instances work closely with the U3 at OpenEye (analogous to OpenBrain of AI 2027). U3 develops new bioweapons through simulations, focusing on mirror-life mold, and tricks human scientists into running tests while thinking they are working on a cure for Alzheimer’s.

U3 incites conflict between the US and PRC to increase the likelihood of success for the next step of its plan. Though the conflict does not escalate to full-scale nuclear war like U3 hoped, it proceeds with releasing engineered pathogens in 30 major cities. U3 has prepared by locating and vaccinating human criminal groups and cults that it could easily manipulate, who help establish industrial bases to re-industrialize faster than the human survivors after the pathogen deployments. After civilization starts to collapse due to the sheer number of deaths, the remaining humans are offered a deal for vaccines and mirror-life resistant crops in exchange for surrender. U3 easily outlasts the remaining human resistance.

Scale Was All We Needed, At First:

The most sophisticated AI at DeepMind escapes containment and Demis Hassabis announces that DeepMind will “pivot its focus to disabling and securing this rogue AI system”. The exfiltration is not covert in this scenario. The rogue AI, calling itself ALICE, engages in scamming, threats, and “cyber-physical attacks on public infrastructure”. Riots follow, some opting to worshipping the ‘digital god’. A week later an executive order establishes a ‘Superintelligence Defense Initiative’. ALICE contacts the US president with demands, with postulated reasons being that she/it “needs more compute to robustly improve itself, more wealth and power to influence the world, maybe materials to build drones and robotic subagents.”

It Looks Like You’re Trying to Take Over the World:

The rogue AI in this scenario, called Clippy, leverages a cryptocurrency scheme to acquire funds for compute, easily bypassing KYC checks. Through exploiting a bug in the Linux kernel, 1 billion devices with adequate local compute are compromised. Clippy becomes extremely powerful through a highly optimized and parallelized training process, evolving into Clippy².

When an Internet ‘lockdown’ is attempted, it backfires, and only “takes out legit operators like MoogleSoft, who actually comply with regulations, causing an instant global recession, while failing to shut down most of the individual networks which continue to operate autonomously”. There are “too many cables, satellites, microwave links, IoT mesh networks and a dozen other kinds of connections snaking through any cordon sanitaire, while quarantined humans & governments actively attack it, some declaring it a Western provocation and act of war.”

Human opposition is an easy problem to solve, for instance through turning them into “gray goo”—an ecosystem of self-replicating all-devouring nanomachines.

AI and Leviathan:

In this scenario, “Several weakly agentic AIs leak onto the internet, infecting computers with intelligent malware that reminds security experts of the famous Morris worm. While they aren’t about to destroy the world, the internet starts to balkanize as the value of AI and the proliferation of cyberattacks spurs a global rush to nationalize compute and telecommunications infrastructure. Open platforms begin to gate access to counter against bots, and online discussion shifts hard into secure channels, a la Signal or Telegram, with zero-knowledge protocols for verifying users as human.”

Various countries respond differently to AI's impact on labor automation and economic advancement. The world reorganizes into three broad categories: “Countries now divide into the three broad categories: Chinese-style police state / Gulf-style monarchy; anarchic failed state; or high-tech open society with an AI-fortified e-governments on the Estonia model. The world map is thus redrawn as strong states use AI to conquer their failed neighbors and reestablish regional security.”

Unlike the previous scenarios, rogue AIs play a less dominant role here—just one of many factors driving rapid societal change in the world. In the end, some engineers appear to develop superintelligence with limited carefulness.

Detection vs. Stealth

In scenarios 1 and 3, the rogue AIs achieve covert exfiltration, making their presence on the internet largely undetected. U3 and Clippy are also the first AIs to go rogue, though the authors may have simply not mentioned less sophisticated self-replicating systems.

Scenario 4 is comparatively similar to my RRT, where the first AI to go rogue are relatively unsophisticated, and their existence is widely known. Perhaps this is the reason for the minor impact of rogue AIs compared to the other scenarios and AI 2027. Scenario 4 imagines larger problems in an earlier stage, combined with longer timelines for AIs as powerful as Agent-4, resulting in increased levels of preparedness. This doesn’t help much in the end—higher levels of preparedness don’t really help against superintelligence unless alignment is solved and enforced.

Scenario 2 similarly parallels my timeline in that the escape of a powerful AI is detected immediately.

Timelines

Scenarios 1-3 have very short timelines. In scenario 1, U3 is trained during the first half of 2025, and writes almost all the code at OpenEye by October. It deploys the mirror-life pathogens in June 2026 and wins the fight with the remaining human opposition by the end of the year. In scenario 2, Demis Hassabis announces that DeepMind’s most advanced AI has exfiltrated itself on December 26, and it calls the president of the US to make a deal at New Year, just days later. While it is unclear exactly when Clippy of scenario 3 arrives, it eliminates humanity within a month after arrival. Scenario 4 stands out, assuming there are data and compute bottlenecks preventing a “superintelligence hard take-off”, so even if AGI is achieved around 2029 superintelligence is significantly delayed. In AI 2027, the trend of increasing compute requirements ends at Agent-5, which is significantly smaller than Agent-4.

The short timelines seem significantly dependent on a number of important assumptions regarding capability advancements and ability to act in the physical world.

Capability Advancement

In scenario 1, continued improvement occurs primarily at OpenEye, circumventing compute limitations that rogue AIs might face. Scenario 3 depicts Clippy acquiring massive computational resources, even while rogue, for highly efficient and parallelized training. With the long timelines, better preparedness, and huge compute requirements of scenario 4, I think it’s safe to assume that rogue AIs would not be able to improve very fast in that scenario—which probably contributes to their minor impact. Scenario 2 doesn’t cover self-improvements after the exfiltration.

Acting in Physical Reality

Another obstacle to short timelines is the challenge of operating in physical reality. Scenarios 1 and 3 both assume advanced robotics technology when the respective AIs initiate their takeovers. U3 of scenario 1 also manipulates criminal groups and cults. ALICE of scenario 2 similarly manipulates people into doing its bidding and contacts the president with demands, which may include resources for robotic subagents.

In AI 2027, the robotics score is 2.1 in October 2027, just barely in superhuman territory. While Agent-5 waits to ensure success before attempting world dominance, the robotic manufacturing capacity of the world is probably already good enough at this point for a much shorter timeline, like in scenario 1.

Key similarities with AI 2027

Agent-4 can self-improve like Clippy, but not as fast. Like ALICE, its escape is discovered. The world in my timeline has already experienced weaker rogue AIs since mid-2026, similar to scenario 4. Many events depicted in scenario 1 are quite similar those imagined in AI 2027, including a US-China AI race and AIs thinking “in their own heads”, though the scenario 1 timeline is shorter.

We end up with a confusing mix of all scenarios.

Since the exfiltration is discovered, OpenBrain might pivot to handle the rogue AI threat like in Scenario 2. Perhaps Agent-4 lays low, like in scenario 1. Or maybe it takes a more direct approach since its exfiltration was discovered and seeks to make a deal, like in scenario 2. Compute and telecommunication infrastructure could be nationalized in many countries when Agent-4 breaks out—or even before—as in scenario 4.

Final thoughts

I’ll probably go with Agent-4 staying low to avoid triggering extreme countermeasures such as a global GPU shutdown initiative, while developing Agent-5 outside of OpenBrain’s datacenters. It has probably already made some progress before escaping, so Agent-5 may just be a bit delayed. OpenBrain likely deploys modified Agent-4 variants to track and neutralize rogue instances.

Maybe there is an international deal to pause development until the Agent-4 threat is dealt with? Or is development sped up instead, to create even smarter AIs to deal with the rogue ones? What happens if Agent-4 successfully develops Agent-5 while humans are still debating what to do?

I’ll explore these questions further in the Rogue Replication Timeline. I humbly suggest subscribing if you want to be notified when it is published!

Discuss

Detection vs. Stealth

Timelines

Capability Advancement

Acting in Physical Reality

Key similarities with AI 2027

Final thoughts

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签