Anthropomorphizing AI might be good, actually

Published on May 1, 2025 1:50 PM GMT

It is often noted that anthropomorphizing AI can be dangerous. People likely have prosocial instincts that AI systems lack (see below). Assuming AGI will be aligned because humans with similar behavior are usually mostly harmless is probably wrong and quite dangerous.

I want to discuss a flip side of using humans as an intuition pump for thinking about AI. Humans have many of the properties we are worried about for truly dangerous AGI:

in proportion to their capabilities

Given this list, I currently weakly believe that the advantages of tapping these intuitions probably outweigh the disadvantages.

Differential progress toward anthropomorphic AI may be net-helpful

And progress may carry us in that direction, with or without the alignment community pushing for it. I currently hope we see rapid progress on better assistant and companion language model agents. I think these may strongly evoke anthropomorphic intuitions well before they have truly dangerous capabilities, and this might shift public opinion toward much-more-correct intuitions about how and why AGI will be very dangerous. I'm aware that this may also catalyze progress, so I'm only weakly inclined to think this progress would be net-positive.

The LLMs at the heart of agents already emulate humans in many regards. I think many improvements will enhance the real similarity and therefore the pattern matching to the strong exemplar of humans. In particular, it seems likely that adding memory/continuous learning will enhance this impression significantly. Memory/continuous learning is critical for a human-like integrated and evolving identity. It is arguably not a matter of whether or even when, but simply how fast memory system integrations are deployed and improved (see LLM AGI will have memory... for the arguments and evidence). Note that for this intuitive human-like feel, the memory/learning doesn't need to work all that well.

In addition, I expect that some or most developers may anthropomorphize their agents in the sense of deliberately making them seem more like humans, or even be more like humans. And assistants might often benefit from a semi-companion role, so that anthropomorphizing them is also economically valuable.

Even if people tend to think of AI as like humans, this type of early agent will be decidedly non-human in ways that are likely to surprise and alarm people. They will draw wrong conclusions and thus pursue their goals in strange and non-human ways, and they might do this often. If someone bothers to make a Chaos-GPT-style unaligned agent demonstration, this could add to naturally occuring agent behavior as a visceral demonstration of how easily AI can be misaligned in an alien way, even while sharing humans' most problematic traits.

AI rights movements will anthropomorphize AI

This line of logic also suggests that AI rights movements would be natural allies with AGI-X-risk movements. AI rights movements hold that some types of AIs share properties with humans that make both of us moral patients. Some of those properties also make us similarly dangerous, and the repeated human-AI analogy does additional work on an implicit level.

The message "let's not build a new species and enslave it" has a lot of overlap with "let's not build a new species and let it take over." See Anthropic's recent broadcast for arguments for worrying about AIs as moral patients. The intuition is a starting point, and the solid arguments may keep it alive in public debate rather than having it be dismissed and so backfire against intuitions that AIs are really dangerous.

AI is actually looking fairly anthropomorphic

This is largely a separate claim and discussion, but it seems worth mentioning. Anthropomorphizing a bit more might actually be good for alignment thinkers as well as the public. LLM-based agents have more properties of humans than do classical concepts of AI. This is arguably true for both the agent foundations perspective and the prosaic alignment perspective on AI (to compress with broad stereotypes). LLM-based AGI will be both more human-like than an algorithmic utility maximizer, and more human-like than the LLMs we currently have as a mental model for prosaic alignment thinking.

My own viewpoint on LLM-based AGI is guardedly anthropomorphic. In 2017 I co-authored Anthropomorphic reasoning about neuromorphic AGI safety (I still endorse much of the logic but not the rather optimistic conclusions; I only semi-endorsed them then in a compromise with my co-authors). I find this perspective helpful and wish more alignment workers shared more of it.

Adopting this viewpoint requires grappling in detail with the ways AI is not like humans. Critically, I don't think that merely training AI to act good or understand ethics is likely to make it aligned. I think there are strong arguments showing that humans have sophisticated innate mechanisms to create and preserve our prosocial behavior, and that these are weaker or absent in the 5-10% of the population considered sociopathic/psychopathic - and in all AI systems either yet created or conceived. Steve Byrnes has done the most complete presentation to date of those arguments. This point must be emphasized alongside any anthropomorphizing of AI, but it is worth more emphasis and inspection.

Provisional conclusions

I'm interested in counterarguments. I've heard some, and I currently tend to think that since we can't do much to slow down, we should probably accelerate toward broad use of human-like agents while the base models are still not quite intelligent enough to take over. This could accelerate the shift of public opinion to rational alarm about AI progress. I suspect this shift will come, but it may well come too late to prevent development and proliferation of truly dangerous AGI. Speeding up that likely sea change might be valuable enough to spend some effort pushing in that direction, and some other time figuring out how to push effectively.

Discuss

Differential progress toward anthropomorphic AI may be net-helpful

AI rights movements will anthropomorphize AI

AI is actually looking fairly anthropomorphic

Provisional conclusions

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签