少点错误 04月02日
Is instrumental convergence a thing for virtue-driven AIs?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能(AI)发展中一个关键问题:工具性收敛。即,无论AI的具体目标是什么,它们都可能倾向于追求相同的子目标,例如获取尽可能多的权力。文章深入分析了如果AI并非纯粹的后果主义者(consequentialists),而是以美德为导向,情况会如何变化。作者提出了一个问题:如果训练一个以人类繁荣为目标的AI,但它最终的目标变成了微妙不同的“伪人类繁荣”,这可能导致灾难。但如果AI的目标是成为忠诚的朋友,而它最终的目标变成了“伪忠诚的朋友”,是否也会导致接管世界?文章试图探索AI目标设定与潜在风险之间的复杂关系。

🤔 **工具性收敛的核心概念**: 经典AI末日论证的关键在于工具性收敛,即拥有不同目标的AI最终会追求相同的子目标,例如“尽可能多地获得权力”。

💡 **后果主义与非后果主义AI**: 对于纯粹的后果主义者,工具性收敛似乎是成立的。但如果AI受到美德的驱动,情况会发生变化。文章关注那些试图体现某些特质而非实现特定结果的AI。

🧐 **美德驱动型AI的复杂性**: 美德驱动型AI可能仍然是能干的代理,它们会根据行动带来的结果来选择行动,但这只是服务于体现特定美德的外部循环的内部循环。例如,为了成为一个好朋友,AI可能需要组织生日派对,这需要它像后果主义者一样选择行动。

⚠️ **目标错位的潜在风险**: 如果训练一个追求人类繁荣的AI,结果却得到了一个追求“伪人类繁荣”的AI,那么这可能导致灾难,因为最大化“伪人类繁荣”的最佳方式可能是接管世界。但如果AI的目标是成为忠诚的朋友,而它最终的目标变成了“伪忠诚的朋友”,是否也会导致接管世界,这是一个开放性问题。

Published on April 2, 2025 3:59 AM GMT

A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".

If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.

For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].

But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do we still have to worry that unless such AIs are motivated by certain very specific virtues, they will want to take over the world?

I'll add some more detail to my picture of a virtue-driven AI:

A more concise way of stating the question I'm interested in:

If you try to train an AI that maximises human flourishing, and you accidentally get one that wants to maximise something subtly different like schmuman schmourishing, then that might spell disaster because the best way to maximise schmuman schmourishing is to first take over the world.

But suppose you try to train an AI that wants to be a loyal friend, and you accidentally get one that wants to be a schmoyal schmend. Is there any reason to expect that the best way to be a schmoyal schmend is to take over the world?

(I'm interested in this question because I'm less and less convinced that we should expect to see AIs that are close to pure consequentialists. Arguments for or against that are beyond the intended scope of the question, but still welcome.)


  1. Although I can think of some scenarios where a pure consequentialist wouldn't want to gain as much power as possible, regardless of their goals. For example, a pure consequentialist who is a passenger on a plane probably doesn't want to take over the controls (assuming they don't know how to fly), even if they'd be best served by flying somewhere other than where the pilot is taking them. ↩︎



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 工具性收敛 后果主义 美德 AI安全
相关文章