少点错误 05月01日 21:57
Anthropomorphizing AI might be good, actually
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了AI拟人化的双面性。一方面,将AI类比于人类可能导致误判,忽视AI缺乏人类固有的亲社会本能这一事实,从而低估潜在风险。另一方面,利用人类作为理解AI的直觉模型也具有优势,因为人类具备情境感知、目标导向、智能、不可预测性和欺骗性等与危险AGI相似的特征。文章认为,在AI发展早期阶段,推动拟人化AI的发展可能有助于公众更准确地认识AGI的潜在危险,从而促进更有效的安全措施。

🧠 拟人化AI可能使公众对AGI的潜在危险产生更准确的直觉。通过模拟人类的特性,AI助手和伴侣可以在真正具备危险能力之前,引发人们对AGI风险的理性警惕。

⚠️ 即使人们倾向于将AI视为人类,早期AI代理仍会在许多方面表现出非人类的特性,这可能会让人们感到惊讶和警惕。AI可能以奇怪和非人类的方式追求目标,从而暴露其未对齐的风险。

🤝 AI权利运动与AGI风险防范运动可能是天然盟友。前者认为某些AI与人类共享某些道德属性,而后者则关注AI的潜在危险,两者都可能通过类比人类来强调AI的风险。

Published on May 1, 2025 1:50 PM GMT

It is often noted that anthropomorphizing AI can be dangerous. People likely have prosocial instincts that AI systems lack (see below). Assuming AGI will be aligned because humans with similar behavior are usually mostly harmless is probably wrong and quite dangerous.

I want to discuss a flip side of using humans as an intuition pump for thinking about AI. Humans have many of the properties we are worried about for truly dangerous AGI:

Given this list, I currently weakly believe that the advantages of tapping these intuitions probably outweigh the disadvantages.

Differential progress toward anthropomorphic AI may be net-helpful

And progress may carry us in that direction, with or without the alignment community pushing for it. I currently hope we see rapid progress on better assistant and companion language model agents. I think these may strongly evoke anthropomorphic intuitions well before they have truly dangerous capabilities, and this might shift public opinion toward much-more-correct intuitions about how and why AGI will be very dangerous. I'm aware that this may also catalyze progress, so I'm only weakly inclined to think this progress would be net-positive. 

The LLMs at the heart of agents already emulate humans in many regards. I think many improvements will enhance the real similarity and therefore the pattern matching to the strong exemplar of humans. In particular, it seems likely that adding memory/continuous learning will enhance this impression significantly. Memory/continuous learning is critical for a human-like integrated and evolving identity. It is arguably not a matter of whether or even when, but simply how fast memory system integrations are deployed and improved (see LLM AGI will have memory... for the arguments and evidence). Note that for this intuitive human-like feel, the memory/learning doesn't need to work all that well. 

In addition, I expect that some or most developers may anthropomorphize their agents in the sense of deliberately making them seem more like humans, or even be more like humans. And assistants might often benefit from a semi-companion role, so that anthropomorphizing them is also economically valuable.

Even if people tend to think of AI as like humans, this type of early agent will be decidedly non-human in ways that are likely to surprise and alarm people. They will draw wrong conclusions and thus pursue their goals in strange and non-human ways, and they might do this often. If someone bothers to make a Chaos-GPT-style unaligned agent demonstration, this could add to naturally occuring agent behavior as a visceral demonstration of how easily AI can be misaligned in an alien way, even while sharing humans' most problematic traits.

AI rights movements will anthropomorphize AI

This line of logic also suggests that AI rights movements would be natural allies with AGI-X-risk movements. AI rights movements hold that some types of AIs share properties with humans that make both of us moral patients. Some of those properties also make us similarly dangerous, and the repeated human-AI analogy does additional work on an implicit level. 

The message "let's not build a new species and enslave it" has a lot of overlap with "let's not build a new species and let it take over." See Anthropic's recent broadcast for arguments for worrying about AIs as moral patients. The intuition is a starting point, and the solid arguments may keep it alive in public debate rather than having it be dismissed and so backfire against intuitions that AIs are really dangerous.

AI is actually looking fairly anthropomorphic

This is largely a separate claim and discussion, but it seems worth mentioning. Anthropomorphizing a bit more might actually be good for alignment thinkers as well as the public. LLM-based agents have more properties of humans than do classical concepts of AI. This is arguably true for both the agent foundations perspective and the prosaic alignment perspective on AI (to compress with broad stereotypes). LLM-based AGI will be both more human-like than an algorithmic utility maximizer, and more human-like than the LLMs we currently have as a mental model for prosaic alignment thinking.

My own viewpoint on LLM-based AGI is guardedly anthropomorphic. In 2017 I co-authored Anthropomorphic reasoning about neuromorphic AGI safety (I still endorse much of the logic but not the rather optimistic conclusions; I only semi-endorsed them then in a compromise with my co-authors). I find this perspective helpful and wish more alignment workers shared more of it. 

Adopting this viewpoint requires grappling in detail with the ways AI is not like humans. Critically, I don't think that merely training AI to act good or understand ethics is likely to make it aligned. I think there are strong arguments showing that humans have sophisticated innate mechanisms to create and preserve our prosocial behavior, and that these are  weaker or absent in the 5-10% of the population considered sociopathic/psychopathic - and in all AI systems either yet created or conceived.  Steve Byrnes has done the most complete presentation to date of those arguments. This point must be emphasized alongside any anthropomorphizing of AI, but it is worth more emphasis and inspection.

Provisional conclusions

I'm interested in counterarguments. I've heard some, and I currently tend to think that since we can't do much to slow down, we should probably accelerate toward broad use of human-like agents while the base models are still not quite intelligent enough to take over. This could accelerate the shift of public opinion to rational alarm about AI progress. I suspect this shift will come, but it may well come too late to prevent development and proliferation of truly dangerous AGI. Speeding up that likely sea change might be valuable enough to spend some effort pushing in that direction, and some other time figuring out how to push effectively.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI拟人化 AGI风险 AI安全
相关文章