少点错误 03月26日 17:32
New AI safety treaty paper out!
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文由“存在风险观察站”发布,探讨了高级通用人工智能(GPAI)带来的潜在风险,包括失控等,并提出了建立国际协议的建议。文章综述了近期关于AI安全的国际协议提案,重点讨论了风险阈值、监管措施、国际协议类型以及相关流程。作者建议设立一个计算阈值,超过该阈值的AI开发需接受严格监管,包括模型审计、信息安全和治理实践,并由国际AI安全机构(AISIs)监督,以在风险不可接受时暂停开发。此外,文章还提到了系统性AI风险和进攻/防御平衡等问题,并呼吁进一步研究和讨论。

🛡️ 建议设立计算阈值,作为识别潜在高风险AI模型的第一道“过滤”程序。超过该阈值的AI开发活动应受到严格监管,以确保安全。

✅ 提议对超过计算阈值的模型进行“模型审计”,包括由AISIs或其他公共机构进行的评估和红队测试。这些审计应在开发过程中定期进行,以验证模型的安全性。

🏢 建议对超过阈值的模型开发者进行安全和治理审计。治理审计将评估开发者是否具备足够的风险管理程序和“安全文化”,以确保AI开发过程的安全性。

🌍 强调实施验证措施的重要性,包括监控AI相关硬件的转移、报告云端计算资源的使用情况等。国际机构的参与可能有助于减少国家间的猜疑和冲突。

Published on March 26, 2025 9:29 AM GMT

Last year, we (the Existential Risk Observatory) published a Time Ideas piece proposing the Conditional AI Safety Treaty, a proposal to pause AI when AI safety institutes determine that its risks, including loss of control, have become unacceptable. Today, we publish our paper on the topic: “International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty”, by Rebecca Scholefield and myself (both Existential Risk Observatory) and Samuel Martin (unaffiliated).

We would like to thank Tolga Bilge, Oliver Guest, Jack Kelly, David Krueger, Matthijs Maas and José Jaime Villalobos for their insights (their views do not necessarily correspond to the paper).

Read the full paper here.

Abstract

The malicious use or malfunction of advanced general-purpose AI (GPAI) poses risks that, according to leading experts, could lead to the “marginalisation or extinction of humanity.”[1] To address these risks, there are an increasing number of proposals for international agreements on AI safety. In this paper, we review recent (2023-) proposals, identifying areas of consensus and disagreement, and drawing on related literature to indicate their feasibility.[2] We focus our discussion on risk thresholds, regulations, types of international agreement and five related processes: building scientific consensus, standardisation, auditing, verification and incentivisation. 

Based on this review, we propose a treaty establishing a compute threshold above which development requires rigorous oversight. This treaty would mandate complementary audits of models, information security and governance practices, to be overseen by an international network of AI Safety Institutes (AISIs) with authority to pause development if risks are unacceptable. Our approach combines immediately implementable measures with a flexible structure that can adapt to ongoing research.

Treaty recommendations

(Below are our main treaty recommendations. For our full recommendations, please see the paper.)

The provisions of the treaty we recommend are listed below. The treaty would ideally apply to models developed in the private and public sectors, for civilian or military use. To be effective, states parties would need to include the US and China. 

Existential Risk Observatory follow-up research

Systemic AI risks

We have worked out the contours of a possible treaty to reduce AI existential risk, specifically loss of control. Systemic risks, however, such as gradual disempowerment, geopolitical risks of intent-aligned superintelligence (see for example the interesting recent work on MAIM), mass unemployment, stable extreme inequality, planetary boundaries and climate, and others, have so far been out of scope. Some of these risks are, however, hugely important for the future of humanity, too. Therefore, we might do follow-up work to address these risks as well, perhaps in a framework convention proposal.

Defense against offense

Many AI alignment projects seem to be expecting that achieving reliably aligned AI will reduce the chance that someone else will create unaligned, takeover-level AI. Historically, some were convinced that AGI would directly result in ASI via a fast takeoff, and such ASI would automatically block other takeover-level AIs, making only alignment of the first AGI relevant. While we acknowledge this as one possibility, we think aligned AI that is powerful enough to help with defense against unaligned ASI, yet not powerful enough or unauthorized to monopolize all ASI attempts by default, is also a realistic possibility. In such a multipolar world, the offense/defense balance becomes crucial.

Although many AI alignment projects seem to rely on offense/defense balance favoring defense, so far little work has been done on aiming to determine whether this assumption holds, and in fleshing out what such defense could look like. A follow-up research project would be to try to shed light on these questions.

We are happy to engage in follow-up discussion either here or via email: u>info@existentialriskobservatory.org</u. If you want to support our work and make additional research possible, consider donating on our website or by reaching out to the email address above, since we are funding-constrained.

We hope our work can contribute to the emerging debate on what global AI governance, and specifically an AI safety treaty, should look like!

  1. ^

    Yoshua Bengio and others, International AI Safety Report (2025) <https://www.gov.uk/government/publications/international-ai-safety-report-2025>, p.101.

  2. ^

    For a review of proposed international institutions specifically, see Matthijs M. Maas and José Jaime Villalobos, ‘International AI Institutions: A Literature Review of Models, Examples, and Proposals,’ AI Foundations Report 1 (2023) <http://dx.doi.org/10.2139/ssrn.4579773>.

  3. ^

    Heim and Koessler, ‘Training Compute Thresholds,’ p.3.

  4. ^

    See paper Section 1.1, footnote 13.

  5. ^

    Cass-Beggs and others, ‘Framework Convention on Global AI Challenges,’ p.15; Hausenloy, Miotti and Dennis, ‘Multinational AGI Consortium (MAGIC)’; Miotti and Wasil, ‘Taking control,’ p.7; Treaty on Artificial Intelligence Safety and Cooperation.

  6. ^

    See Apollo Research, Our current policy positions (2024) <https://www.apolloresearch.ai/blog/our-current-policy-positions> [accessed Feb 25, 2025].

  7. ^

    For example, the U.S. Nuclear Regulatory Commission set a quantitative goal for a Core Damage Frequency of less than 1 × 10⁻⁴ per year. United States Nuclear Regulatory Commission, ‘Risk Metrics for Operating New Reactors’ (2009) <https://www.nrc.gov/docs/ML0909/ML090910608.pdf>.

  8. ^

    See paper Section 1.2.3.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 国际协议 风险管理 通用人工智能
相关文章