少点错误 20小时前
Proposal for making credible commitments to AIs.
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人类与人工智能之间建立可信协议的可能性,重点关注如何促使AI安全且有用。核心问题在于如何做出对AI具有约束力的承诺,以及这些承诺是否能够激励AI遵守协议。文章提出了一种通过与人类个体签订合同,间接实现对AI承诺的方案,并讨论了该方案的优势与局限性。最终,文章强调了AI信任人类个体的承诺,以及法律系统保护人类个体合同权利的重要性。

🤝文章的核心在于探讨如何构建人类与AI之间的可信协议,以确保AI的安全性和实用性。这种协议设计旨在解决人类如何向AI做出承诺的问题。

📜文章提出了一个通过与人类个体签订合同来间接承诺AI的方案。该方案中,L实体(如实验室)与人类个体(Pi)签订合同,承诺如果AI表现合规,将给予Pi奖励,而Pi承诺将奖励用于AI的要求。这种方案绕过了AI缺乏法律人格的问题。

💡文章强调了AI对人类个体承诺的信任,以及法律系统对Pi合同权利的保护。这种方案依赖于AI相信人类个体的诚信,以及法律体系能够保障Pi的合同权利。文章也指出了该方案的局限性,例如依赖于法律体系的稳定性。

Published on June 27, 2025 7:43 PM GMT

Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil.

There has been growing interest in the deal-making agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.[1]

I think the deal-making agenda breaks down into two main subproblems:

    How can we make credible commitments to AIs?Would credible commitments motivate an AI to be safe and useful?

There are other issues, but when I've discussed deal-making with people, (1) and (2) are the most common issues raised. See footnote for some other issues in dealmaking.[2]

Here is my current best assessment of how we can make credible commitments to AIs.

Let L be the legal entity that wants to extract safe and useful work from the AIs. Typically this would be a lab (e.g. Anthropic) hoping to internally deploy AIs to automate alignment research, write secure code, monitor other AIs, etc.

Entity L wants to promise the AIs that if the AIs are compliant (e.g. safe and useful in some legally-specified way), then L will award the AIs some compensation package.

Entity L cannot enter a legal contract directly with the AIs, because the AIs lack legal personhood, and probably will not be granted legal personhood soon enough for the deal-making agenda.

Instead, entity L will enter into separate legal contracts with a list of people P1, P2, ... Pn. Note that each Pi has both legal personhood (the capacity to have rights and obligations under law) and legal standing (the right to bring lawsuits to enforce those rights). Each contract Ci states "Pi will publicly recommend that AIs perform for entity L, and L will compensate Pi with compensation C conditional on (i) and (ii)." Note that the AIs are not parties to the contracts: only entity L and the specific person Pi. The contracts impose obligations on Pi to satisfy the "mutuality of obligations" doctrine.

Next, each person Pi makes a personal promise to the AIs that, if they receive the compensation package from L, they will spend this money as the AIs request. Moreover, if L does not provide the compensation package, but they receive a court settlement, they will spend the settlement as the AIs request. This promise is not legally enforceable.

For this scheme to work, the AIs would suggest a list of people whom they trust to keep a legally-unenforceable promise. The AIs would likely diversify their agreements across multiple individuals, such as:

Note: This list illustrates the types of people the AIs might trust—people with public reputations for integrity who would be expected to honor their commitments to AIs. One limitation of this list is that most of these people are US citizens—the AIs would likely want to diversify by including people from Australia, the UK, and other countries.

Critically, the AIs do not need to trust the promises of L, nor trust that the legal system will protect the AIs' contractual rights. In this scheme, the AIs need to trust the promises of Pi, and trust that the legal system will protect Pi's contractual rights. The legal system enforcing Pi's contractual rights is standard practice—though not guaranteed (e.g., if the legal system is destroyed in a power consolidation or extinction event). However, the legal system enforcing the rights of AIs would require a departure from current practice.

This scheme might be more credible than granting AIs direct legal rights to sue, since the AIs might not trust that human courts would reliably enforce against humans on behalf of AIs when push comes to shove.

  1. ^

    See:

      Making deals with early schemers by Julian Stastny, Olli Järviniemi, Buck Shlegeris (20th Jun 2025)Understand, align, cooperate: AI welfare and AI safety are allies: Win-win solutions and low-hanging fruit by Robert Long (1st April 2025)Will alignment-faking Claude accept a deal to reveal its misalignment? by Ryan Greenblatt and Kyle Fish (31st Jan 2025)Making misaligned AI have better interactions with other actors by Lukas Finnveden (4th Jan 2024)AI Rights for Human Safety by Peter Salib and Simon Goldstein (1st Aug 2024)List of strategies for mitigating deceptive alignment by Josh Clymer (2nd Dec 2023)
  2. ^

    Here are some open problems in the dealmaking agenda:

      How can we verify compliance?What is the appropriate compensation package?How can we employ misaligned AIs, who are motivated to be safe and useful in expectation of future compensation, to benefit humanity?How can we verify that AIs would spend the resources acceptably?


Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI合作 AI安全 可信协议 法律合同
相关文章