少点错误 05月02日 06:57
AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIRI技术治理团队发布了一项新的AI治理研究议程,旨在描述AI发展的战略格局并整理重要的治理研究问题。该议程围绕四种高级情景展开,分别是:国际协调停止危险AI开发(关闭开关和停止)、美国国家项目主导全球AI发展、政府有限干预的私营部门发展,以及各国通过相互威胁来维持AI能力在安全水平。报告重点关注“关闭开关”情景,即构建技术、法律和机构基础设施,以限制危险AI的开发和部署,最终实现国际协调的AI活动暂停。该议程旨在协助美国国家安全界和AI治理研究生态系统预防灾难性后果。

🛑“关闭开关”与暂停:该方案设想全球协调开发监控和限制危险AI活动的能力,最终可能导致全球暂停前沿AI系统的开发和部署,直到该领域有充分的信心在没有灾难性风险的情况下恢复进展。这需要构建必要的技术、法律和机构基础设施,即AI的“关闭开关”。

🇺🇸美国国家项目:该方案设想美国政府竞相开发先进AI系统,并建立对全球AI开发的单方面控制。该方案面临诸多挑战,包括保持领先地位、避免AI能力扩散给恐怖分子、避免战争、克服硬件或软件限制,以及避免AI失控。

💡轻触式监管:该方案类似于当前世界,政府对AI公司采取轻触式监管方式。但报告认为这种情况在长期内是不可持续的,因为随着AI在军事和经济上变得具有战略重要性,政府将更多地参与AI开发。此外,开放发布高能力AI模型可能会增加恶意行为者造成大规模风险的可能性。

Published on May 1, 2025 10:46 PM GMT

We’re excited to release a new AI governance research agenda from the MIRI Technical Governance Team. With this research agenda, we have two main aims: to describe the strategic landscape of AI development and to catalog important governance research questions. We base the agenda around four high-level scenarios for the geopolitical response to advanced AI development. Our favored scenario involves building the technical, legal, and institutional infrastructure required to internationally restrict dangerous AI development and deployment (which we refer to as an Off Switch), which leads into an internationally coordinated Halt on frontier AI activities at some point in the future. This blog post is a slightly edited version of the executive summary.

We are also looking for someone to lead our team and work on these problems, please reach out here if you think you’d be a good fit.


The default trajectory of AI development has an unacceptably high likelihood of leading to human extinction. AI companies such as OpenAI, Anthropic, and Google DeepMind aim to develop AI systems that exceed human performance at most cognitive tasks, and many experts think they will succeed in the next few years. This development alone will be one of the most transformative and potentially destabilizing events in human history—similar in scale to the Industrial Revolution. But the most worrisome challenges arise as these systems grow to substantially surpass humanity in all strategically relevant activities, becoming what is often referred to as artificial superintelligence (ASI).

In our view, the field of AI is on track to produce ASI while having little to no understanding of how these systems function and no robust means to steer and control their behavior. AI developers, policymakers, and the public seem radically unprepared for the coming impacts of AI. There is a fraught interplay between the risks posed by AI systems themselves and the surrounding geopolitical situation. The coming years and decades thus present major challenges if we are to avoid large-scale risks from advanced AI systems, including:

This report provides an extensive collection of research questions intended to assist the US National Security community and the AI governance research ecosystem in preventing these catastrophic outcomes.

The presented research agenda is organized by four high-level scenarios for the trajectory of advanced AI development in the coming years: a coordinated global Halt on dangerous AI development (Off Switch and Halt), a US National Project leading to US global dominance (US National Project), continued private-sector development with limited government intervention (Light-Touch), and a world where nations maintain AI capabilities at safe levels through mutual threats of interference (Threat of Sabotage). We explore questions about the stability and viability of the governance strategy that underlies each scenario: What preconditions would make each scenario viable? What are the main difficulties of the scenario? How can these difficulties be reduced, including by transitioning to other strategies? What is a successful end-state for the scenario?

A rough tree of the scenarios discussed in this report and how AI governance paradigms may evolve. This diagram assumes catastrophe is avoided at each step, so the myriad failures are omitted. This is simplified, and one could draw many more connections between the scenarios.

Off Switch and Halt

The first scenario we describe involves the world coordinating to develop the ability to monitor and restrict dangerous AI activities. Eventually, this may lead to a Halt: a global moratorium on the development and deployment of frontier AI systems until the field achieves justified confidence that progress can resume without catastrophic risk. Achieving the required level of understanding and assurance could be extraordinarily difficult, potentially requiring decades of dedicated research.

While consensus may currently be lacking about the need for such a Halt, it should be uncontroversial that humanity should be able to stop AI development in a coordinated fashion if or when it decides to do so. We refer to the technical, legal, and institutional infrastructure needed to Halt on demand as an Off Switch for AI.

We are focused on an Off Switch because we believe an eventual Halt is the best way to reduce Loss of Control risk from misaligned advanced AI systems. Therefore, we would like humanity to build the capacity for a Halt in advance. Even for those skeptical about alignment concerns, there are many reasons Off Switch capabilities would be valuable. These Off Switch capabilities—the ability to monitor, evaluate, and, if necessary, enforce restrictions on frontier AI development—would also address a broader set of national security concerns. They would assist with reducing risks from terrorism, geopolitical destabilization, and other societal disruption. As we believe a Halt is the most credible path to avoiding human extinction, the full research agenda focuses primarily on the Off Switch and Halt scenario.

The research questions in this scenario primarily concern the design of an Off Switch, and the key details to enforcing a Halt. For example:

US National Project

The second scenario is the US National Project, in which the US government races to develop advanced AI systems and establish unilateral control over global AI development. This scenario is based on stories discussed previously by Leopold Aschenbrenner and Anthropic CEO Dario Amodei. A successful US National Project requires navigating numerous difficult challenges:

Some of these challenges look very difficult, such that pursuing the project would be unacceptably dangerous. We encourage other approaches, namely coordinating a global Off Switch and halting dangerous AI development.

The research questions in this section examine how to prepare for and execute on the project, along with approaches for pivoting away from a National Project and into safer strategies. For example:

Light-Touch

Light-Touch is similar to the current world, where the government takes a light-touch approach to regulating AI companies. We have not seen a credible story for how this situation is stable in the long run. In particular, we expect governments to become more involved in AI development as AIs become strategically important, both militarily and economically. Additionally, the default trajectory will likely involve the open release of highly capable AI models. Such models would drastically increase large-scale risks from malicious actors, for instance, by assisting with biological weapon development. One approach discussed previously to remedy this situation is defensive acceleration: investing heavily in defense-oriented technologies in order to combat offensive use. We are pessimistic about such an approach because some emerging technologies—such as biological weapons—appear much easier to weaponize than to defend against. The Light-Touch approach also involves similar risks to those in the US National Project, such as misalignment and war. We think the Light-Touch scenario is extremely unsafe.

The research questions in this section are largely about light government interventions to improve the situation or transitioning into an Off Switch strategy or US National Project. For example:

Threat of Sabotage

AI progress could disrupt the balance of power between nations (e.g., enable a decisive military advantage), so countries might take substantial actions to interfere with advanced AI development. Threat of Sabotage, similar to Mutual Assured AI Malfunction (MAIM) described in Superintelligence Strategy, describes a strategic situation where AI development is slow because countries threaten to sabotage rivals’ AI progress. Actual sabotage may occur, via substantial actions to interfere with AI development, although merely the threat of sabotage could be sufficient to keep AI progress slow. The state of thinking about this scenario is nascent, and we are excited to see further analysis of its viability and implications.

One of our main concerns is that the situation only remains stable if there is a high degree of visibility into AI projects and potential for sabotage, but these are both complicated factors that are difficult to predict in advance. Visibility and potential for sabotage are both likely high in the current AI development regime, where frontier AI training requires many thousands of advanced chips, but this situation could change.

The research questions in this section focus on better understanding the viability of the scenario and transitioning into a more cooperative Off Switch scenario. For example:  

Understanding the World

There are some research projects that are generally useful for understanding the strategic situation and gaining situational awareness. We include some research projects in this section because they are broadly useful across many AI development trajectories. For example:

Outlook

Humanity is on track to soon develop AI systems smarter than the smartest humans; this might happen in the 2020s or 2030s. On both the current trajectory and some of the most likely variations (such as a US National Project), there is an unacceptably large risk of catastrophic harm. Risks include terrorism, world war, and risks from AI systems themselves (i.e., loss of control due to AI misalignment).

Humanity should take a different path. We should build the technical, legal, and institutional infrastructure needed to halt AI development if and when there is political will to do so. This Halt would provide the conditions for a more mature AI field to eventually develop this technology in a cautious, safe, and coordinated manner.

We hope this agenda can serve as a guide to others working on reducing large-scale AI risks. There is far too much work for us to do alone. We will make progress on only a small subset of these questions, and we encourage other researchers to work on these questions as well.

If you're interested in collaborating with, joining, or even leading our work on these critical problems, please contact us.

ScenarioProsConsLoss of ControlMisuseWarBad Lock-in
Off Switch and Halt

Careful AI development;

International legitimacy;

Slow societal disruption

Difficult to implement;

Not a complete strategy;

Slow AI benefits

LowLowLowMid
US National Project

Centralized ability to implement safeguards;

Limited proliferation

Arms race;

Breaking international norms

HighLowHighHigh
Light-Touch

Fast economic benefit;

Less international provocation;

Easy to implement (default)

Corporate racing;

Proliferation;

Limited controls available;

Untenable

HighHighMidMid
Threat of Sabotage

Slower AI development;

Limited cooperation needed

Ambiguous stability;

Escalation

MidLowHighMid

A table of the main pros and cons of the scenarios and how much of each core risk they involve. Ratings are based on our analysis of the scenarios. Color coding reflects the risk level.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI治理 AI风险 AI安全 人工智能 战略安全
相关文章