Proposing the Conditional AI Safety Treaty (linkpost TIME)

Published on November 15, 2024 1:59 PM GMT

Technological progress can excite us, politics can infuriate us, and wars can mobilize us. But faced with the risk of human extinction that the rise of artificial intelligence is causing, we have remained surprisingly passive. In part, perhaps this was because there did not seem to be a solution. This is an idea I would like to challenge.

AI’s capabilities are ever-improving. Since the release of ChatGPT two years ago, hundreds of billions of dollars have poured into AI. These combined efforts will likely lead to Artificial General Intelligence (AGI), where machines have human-like cognition, perhaps within just a few years.

Hundreds of AI scientists think we might lose control over AI once it gets too capable, which could result in human extinction. So what can we do?

The existential risk of AI has often been presented as extremely complex. A 2018 paper, for example, called the development of safe human-level AI a “super wicked problem.” This perceived difficulty had much to do with the proposed solution of AI alignment, which entails making superhuman AI act according to humanity’s values. AI alignment, however, was a problematic solution from the start.

First, scientific progress in alignment has been much slower than progress in AI itself. Second, the philosophical question of which values to align a superintelligence to is incredibly fraught. Third, it is not at all obvious that alignment, even if successful, would be a solution to AI’s existential risk. Having one friendly AI does not necessarily stop other unfriendly ones.

Because of these issues, many have urged technology companies not to build any AI that humanity could lose control over. Some have gone further; activist groups such as PauseAI have indeed proposed an international treaty that would pause development globally.

That is not seen as politically palatable by many, since it may still take a long time before the missing pieces to AGI are filled in. And do we have to pause already, when this technology can also do a lot of good? Yann Lecun, AI chief at Meta and prominent existential risk skeptic, says that the existential risk debate is like “worrying about turbojet safety in 1920.”

On the other hand, technology can leapfrog. If we get another breakthrough such as the transformer, a 2017 innovation which helped launch modern Large Language Models, perhaps we could reach AGI in a few months’ training time. That’s why a regulatory framework needs to be in place before then.

Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

Building upon their work, we at the nonprofit Existential Risk Observatory propose a Conditional AI Safety Treaty. Signatory countries of this treaty, which should include at least the U.S. and China, would agree that once we get too close to loss of control they will halt any potentially unsafe training within their borders. Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.

One outstanding question is at what point AI capabilities are too close to loss of control. We propose to delegate this question to the AI Safety Institutes set up in the U.K., U.S., China, and other countries. They have specialized model evaluation know-how, which can be developed further to answer this crucial question. Also, these institutes are public, making them independent from the mostly private AI development labs. The question of how close is too close to losing control will remain difficult, but someone will need to answer it, and the AI Safety Institutes are best positioned to do so.

We can mostly still get the benefits of AI under the Conditional AI Safety Treaty. All current AI is far below loss of control level, and will therefore be unaffected. Narrow AIs in the future that are suitable for a single task—such as climate modeling or finding new medicines—will be unaffected as well. Even more general AIs can still be developed, if labs can demonstrate to a regulator that their model has loss of control risk less than, say, 0.002% per year (the safety threshold we accept for nuclear reactors). Other AI thinkers, such as MIT professor Max Tegmark, Conjecture CEO Connor Leahy, and ControlAI director Andrea Miotti, are thinking in similar directions.

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has expressed concern about the risks posed by AI, too.

The Conditional AI Safety Treaty could provide a solution to AI’s existential risk, while not unnecessarily obstructing AI development right now. Getting China and other countries to accept and enforce the treaty will no doubt be a major geopolitical challenge, but perhaps a Trump government is exactly what is needed to overcome it.

A solution to one of the toughest problems we face—the existential risk of AI—does exist. It is up to us whether we make it happen, or continue to go down the path toward possible human extinction.

The title of this piece has been adapted to increase clarity for a different audience

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签