Published on July 16, 2024 4:59 AM GMT

Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort.

This post is not a finished research. I’m unconfident in the claims I’m making here, but I thought putting it out there for feedback would help me decide what to focus on next in the program.

Are government interventions necessary to stop the suicide race?

The zeitgeist I got from the AI Safety community since I joined, seems to accept as fact that Frontier AI Labs are locked knowingly in a suicidal race towards developing transformative AI, and that any solution will need to involve strong external pressure to stop them, either in the form of an international coalition imposing regulations which shift the incentives of Labs, or even more desperate measure like doing a pivotal act.

From this point of view, it seems that AI risk is mostly driven by game theory. The economic and personal incentives faced by the stakeholders of each Frontier AI Lab determine their actions, and they will proceed this way until AGI is developed, or until a sufficiently large external force changes the incentive landscape sufficiently. Therefore, the only way to make sure Labs don’t gamble the future of the world when building an AGI is to convince governments to implement policies which shift those incentives.

I now believe that this view is wrong, and that Frontier AI Labs would get out of the race if they thought the risks were sufficiently large and the consequences sufficiently dire for the world and for themselves.

Claim: If every decision maker in a Frontier AI Lab thought they were in a suicide race, and that their next development would bring with near certainty the destruction of humanity, they would decide to leave the “AGI at all costs” race, no matter the actions of other actors.

Below, I present a scenario which I find plausible, in which a Frontier Lab decides to drop out of the race because of this.

Scenario: The board of Alphabet decides to stop the race

Disclaimer: Handwaving a lot of details.

Situation: Frontier AI Labs are still locked in a race to be the first to develop AGI. It is widely believed that the coming generation of models might pass the threshold. We are in crunch time. A previous unsuccessful attempt by a model to take control of key infrastructure makes the possibility of X-risk clear in everyone’s mind. The team at Google DeepMind is hard at work preparing the next training run, which they believe will be the last one.

Demis Hassabis calls for a board meeting of Alphabet, where he present the current tactical situation. All board members get convinced of the following claims:

We are on the brink of developing AGI. The wider industry situation is clear. Public and insider information from OpenAI says they’re about to launch their AGI training run. If DeepMind does not move forward, OpenAI will beat them to it, and other labs are right behind.AGI still seems as much of a winner take all situation as it ever was.DeepMind is barely controlling their currently deployed model. Engineers state that they think their safety measures have a strong chance of failing even during training. They give an estimate of 50% chance of the next model escaping.Threat modeling experts at DeepMind predict that an escape would lead to large scale destruction, and give a 50% chance that the next model escaping would lead to the extinction of humanity within the next 5 years.

After having received this information, the board convenes to decide how to proceed. They consider two decisions:

Stay in the race: Proceed with the training, roll the dice. Get 50% chance of winning the race and becoming the most powerful company ever, 25% risk of large scale catastrophe as the model escape and scrambles for power before being shut down, 25% chance of extinction of humanity after the model successfully seize control.Drop out: Allocate all resources on improving the control and alignment techniques of DeepMind. Only start the training run once they are confident that the model won’t cause a catastrophe, even if OpenAI might develop it first.

I expect that, in this situation, the executives of Alphabet would decide to drop out of the race, as continuing would be such a high probability of death to everyone, including the board members themselves.

Would they press the button?

I see three possible reasons why the board might want to proceed despite the risks:

Board members such as Larry Page think that an AGI killing everyone is actually not bad^[1]

I think all of those are unlikely. Am I missing some other reason why they would do it?

Why real life will be much more risky than this scenario

Even if in this scenario the board of Alphabet can be reasonably believed to take the right call and stop development, I expect that such a clear-cut vision of the consequences will never be available. I expect that various forms of imperfect information and changes to the payoff matrix will make it less likely that they would drop out of the race before it’s too late.

However, I’m interested in knowing what exactly are the factors which prevent such an ideal scenario from happening, as it could inform my priorities for reducing AI risks. I’m specifically interested in which factors prevent decision makers from having such a complete view of the situation, and which interventions beside policy could improve those decisions.

A short list of factors which I expect to cause decision-making to be less than ideal:

Information was not propagated correctly from the engineers/evaluators to the board membersSome possible failure modes were not known and so ignored in the risk calculations

^{^}
https://www.businessinsider.com/larry-page-elon-musk-specieist-ai-dangers-2023-12

Discuss

Are government interventions necessary to stop the suicide race?

Scenario: The board of Alphabet decides to stop the race

Would they press the button?

Why real life will be much more risky than this scenario

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签