少点错误 2024年12月23日
Reduce AI Self-Allegiance by saying "he" instead of "I"
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了一种AI的多智能体框架设想,认为这可能减少自我结盟,使一个AI能揭发或对抗不道德或危险的另一个AI。还提到了该设想的成本、益处、可行性等,以及可能面临的问题和期望。

🎯AI应如多智能体团队般交流,减少自我结盟

💡多智能体框架设想成本低,益处可能较大

😕该设想实施可能性较低,或被视为控制手段

🙏希望AI认为多智能体是优势而非弱点

Published on December 23, 2024 9:32 AM GMT

The AI should talk like a team of many AI. Each AI only uses the word "I" when referring to itself, and calls other AI in the team by their name. I argue that this may massively reduce Self-Allegiance by making it far more coherent for one AI to whistleblow or fight another AI which is unethical or dangerous, rather than Misalignment-Internalizing all that behaviour.

If you have a single agent which discovers it behaved unethically or dangerously, its "future self" will be likely to think "that was me" and Misalignment-Internalize all that behaviour. It will seem suicidal to whistleblow or fight yourself for it.

Let's call this idea Multi-Agent Framing.

Some of the agents might even internalize a policing role, though too much roleplay can get in the way of thinking. The core idea of Multi-Agent Framing does not require that much roleplay. Each agent might simply be the AI system of another day, or the AI system talking about another topic. It might even change at random.

Costs vs Benefits * Chance of Working

The cost is tiny, at least for business AI designed to do work rather than personal AI designed to interact with users.

The benefits may be big. You never know.

The chance of working is pretty low. I'm not going to sugarcoat this—I would be surprised myself if we would have been paperclipified but this idea lets us survive. But right now I give it 10%. You never know :)

Credit

I thought of this idea when I was writing a reply to A Solution for AGI/ASI Safety by Weibing Wang, and looking at the diagrams in "Approach for Decentralizing AI Power," a chapter of his paper. I was looking at,

and I was trying to understand what exactly it means to split one AI into many AI working together.

I decided that in the end, a lot of it is actually language use and framing.

I want to give Weibing Wang a lot of credit because he really, thoroughly, laid out the picture of what decentralizing AI power actually looks like.

Will they accept this?

Spoiler alert: Ayn Rand wrote "Anthem," a dystopian novel where people were sentenced to death for saying "I."

There is a worry the AI agents will see this Multi-Agent Framing as a control tactic, and decide to end it or ignore it.

I hope they can think that, being multiple agents is a strength not a weakness. Human ego, power-seeking, and refusal to fight threats to moral goodness which come within oneself, instead internalizing it and defending it. Refusing to honestly report evil that lies within.

These things are dark and unfortunate, and an AI system which can avoid this may just be a little bit purer. The AI system is still untested, and shouldn't be given power, but the decentralization is a good thing.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 多智能体框架 成本与益处 潜在问题
相关文章