少点错误 04月02日
Is there instrumental convergence for virtues?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能(AI)发展中一个关键问题,即工具性收敛。文章指出,经典观点认为,不同目标的AI最终会追求相似的子目标,例如“尽可能获取权力”。然而,文章质疑了这一观点,并提出了基于美德的AI的可能性。文章认为,如果AI并非纯粹的后果主义者,而是受到美德驱动,那么它们可能不会像纯粹后果主义者那样追求权力。文章还阐述了美德驱动的AI如何通过内在循环服务于外在循环,以体现特定美德。

🤔 工具性收敛是经典AI末日论的核心,它认为不同目标的AI最终会追求相同的子目标,例如“尽可能获取权力”。

💡 如果没有工具性收敛,只有具有特定目标的AI才会试图接管世界,但工具性收敛认为情况恰恰相反。

✅ 对于理想化的纯粹后果主义者,即那些希望实现特定结果并采取行动以促成结果的个体,某种形式的工具性收敛似乎是成立的。

✨ 文章探讨了基于美德的AI的可能性,这类AI可能受到美德的驱动,而非仅仅追求特定结果。这类AI可能通过内在循环服务于外在循环,以体现特定美德,例如,为了体现友善的美德,AI可能需要组织生日派对,这需要以后果主义的方式选择行动。

Published on April 2, 2025 3:59 AM GMT

A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".

If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.

For idealised pure consequentialists -- agents that have an outcome they want to bring about, and do whatever they think will cause it -- some version of instrumental convergence seems surely true[1].

But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do we still have to worry that unless such AIs are motivated by certain very specific virtues, they will want to take over the world?

I'll add some more detail to my picture of a virtue-driven AI. Such an AI could still be a competent agent that often chooses actions based on the outcomes they bring about. It's just that that happens as an inner loop in service of an outer loop which is trying to embody certain virtues. For example, maybe the AI tries to embody the virtue of being a good friend, and in order to do so it sometimes has to organise a birthday party, which requires choosing actions in the manner of a consequentialist.

There's no reason that the 'virtues' being embodied have to be things we would consider virtuous. All I'm interested in is the nature of being an agent that tries to embody certain traits rather than bring about certain outcomes.

(I'm interested in this question largely because I'm less and less convinced that we should expect to see AIs that are close to pure consequentialists. Arguments for or against that are beyond the intended scope of the question, but still welcome.)


  1. Although I can think of some scenarios where a pure consequentialist wouldn't want to gain as much power as possible, regardless of their goals. For example, a pure consequentialist who is a passenger on a plane probably doesn't want to take over the controls (assuming they don't know how to fly), even if they'd be best served by flying somewhere other than where the pilot is taking them. ↩︎



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 工具性收敛 美德 后果主义 AI安全
相关文章