少点错误 01月16日
"Pick Two" AI Trilemma: Generality, Agency, Alignment.
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨AI发展中存在的三角困境,即AI在通用性、自主性和对齐性中只能兼顾其二。文中分别阐述了三种组合情况及相应的特点和局限性,并指出目前实现三者兼顾尚有困难,但新的研究有望解决这一问题。

通用性和自主性⇒对齐性牺牲:高度通用且自主的AI系统存在经典对齐问题,需严格限制来保持对齐。

通用性和对齐性⇒自主性受限:通过将智能AI限制为按需提供信息,可利用其知识同时避免自主执行计划。

自主性和对齐性⇒通用性受限:在狭窄领域或有限能力水平内构建强对齐的AI代理,其通用性受限但可简化对齐。

Published on January 15, 2025 6:52 PM GMT

Introduction

The conjecture is that an AI can fully excel in any two of these dimensions only by compromising the third. 

In other words, a system that is extremely general and highly agentic will be hard to align; one that is general and aligned must limit its agency; and an agentic aligned system must remain narrow. Below, I discuss how today’s AI designs implicitly “pick two.”

This is a useful mental model to look at AI systems because it clarifies fundamental tensions in contemporary AI design. It highlights how and where compromises typically arise.

Generality + Agency ⇒ Alignment sacrificed. 

An AI that is both very general and truly agentic – selecting and pursuing open-ended goals – poses the classic alignment problem. This is a much discussed topic, and its suffice to say that absent new breakthroughs, highly general and agentic AI systems will require stringent constraints (on objectives, actions, or knowledge) to remain aligned.

Generality + Alignment ⇒ Agency curtailed.

One path to aligned general intelligence is to remove persistent agency. A well-known concept is the Oracle or Tool AI, a super-intelligent system designed only for answering questions, with no ability to act in the world. By confining a generally intelligent AI to provide information or predictions on request (like a highly advanced question-answering system or an LLM “simulator” of possible responses) it is possible to leverage its broad knowledge while keeping it from executing plans autonomously. This setup encourages the AI to defer to human input rather than seize agency.

Modern large language models, which exhibit considerable generality, are deployed as assistants with carefully constrained actions. They are aligned via techniques like instruction tuning and RLHF, but notably, these techniques limit the AI’s “will” to do anything outside the user’s query or the allowed policies. They operate in a text box, not roaming the internet autonomously (except with cautious tool-use, and always under user instruction). As a result, we get generally knowledgeable, helpful systems that lack independent agency – effectively sacrificing the agent property to maintain alignment.

Agency + Alignment ⇒ Limited Generality

The third combination is building AI agents that are strongly aligned within a narrow domain or limited capability level. Narrow AI agents (a chess engine or autonomous vehicle) have specific goals and can act autonomously, but their generality is bounded. This limit on scope simplifies alignment: designers can more exhaustively specify objectives and safety constraints for a confined task environment.

Systems in many specialized roles (drones, industrial robots, recommendation algorithms) operate with agency but within narrow scopes and with heavy supervision or “tripwires” to stop errant behavior. For example, an autonomous driving system is designed with explicit safety rules and operates only in the driving context. It has agency (controls a vehicle) and is intended to be aligned with human safety values, but it certainly cannot write a novel or manipulate the stock market. In essence, we pay for alignment by constraining the AI’s generality.

Conclusion

Achieving two out of the triad is feasible: we can build very general tools (GPT-style oracles) that remain aligned by shunning agentic autonomy, or highly agentic systems aligned to human-specified tasks (like AlphaGo) that aren’t generally intelligent. We do not yet know how to build a generally intelligent, autonomous agent that we can trust with arbitrary decisions in the open world.

This doesn’t mean the trilemma is insurmountable in principle – ongoing research in value learning, transparency, and control theory aims to bend these trade-offs. But until alignment techniques reliably scale with capability, prudent AI development will “pick two.

The hope is that with new alignment paradigms (e.g. scalable oversight or provably beneficial AI), future AI can expand toward full generality and agency without sacrificing safety – but until then, any claim of achieving all three should be met with healthy skepticism.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI发展 三角困境 通用性 自主性 对齐性
相关文章