Why care about AI personhood?

Published on January 26, 2025 11:24 AM GMT

In this new paper, I discuss what it would mean for AI systems to be persons — entities with properties like agency, theory-of-mind, and self-awareness — and why this is important for alignment. In this post, I say a little more about why you should care.

The existential safety literature focuses on the problems of control and alignment, but these framings may be incomplete and/or untenable if AI systems are persons.

The typical story is approximately as follows.

AI x-safety problem (the usual story):

goal-directed agents

Solution (technical control/alignment):

We better figure out how to put the goals we want (“our values”) into capable agents.

There are, of course, different versions of this framing and more nuanced perspectives. But I think something like this is the “standard picture” in AI alignment.

Stuart Russell seems to have a particularly strong view. In Human Compatible, one of his core principles is: “The machine's only objective is to maximise the realisation of human preferences.” and in a recent talk he asked “How can humans maintain control over AI — forever?”

This framing of the x-safety problem, at least in part, arises from the view of (super)intelligent AI systems as rational, goal-directed, consequentialist agents. Much of the literature is grounded in this cluster of views, using either explicit rational agent models from decision and game theory (e.g., the causal incentives literature), or somewhat implicit utility-maximising assumptions about the nature of rationality and agency (e.g., Yudkowsky).

I don’t necessarily want to claim that these framings are “wrong” but I feel they are incomplete. Consider how changing the second condition in the above problem statement influences the rest of the argument.

self-aware persons (with autonomy and freedom of the will)

Given this picture, a technical solution to the problem of alignment may seem less feasible, less desirable for our own sakes, and less ethical.

Less feasible because, by their nature as self-reflective persons, capable AI agents will be less amenable to undue attempts to control their values. Some work already discusses how self-awareness wrt one’s position in the world, e.g., as an AI system undergoing training, raises problems for alignment. Other work discusses how AI agents will self-reflect and self-modify, e.g., to cohere into utility maximisers. Carlsmith also highlights self-reflection as a mechanism for misalignment. But overall, the field often treats agents as systems with fixed goals, and the role of self-reflection in goal-formation is relatively underappreciated.

Less desirable because, whereas the standard view of agency conjures a picture of mechanical agents unfeelingly pursuing some arbitrary objective (cf. Bostrom’s orthogonality thesis), imagining AI persons gives us a picture of fellow beings who reflect on life and “the good” to determine how they ought to act. Consider Frankfurt’s view of persons as entities with higher-order preferences --- persons can reflect on their values and goals and thereby induce themselves to change. This view of powerful AI systems makes it counterintuitive to imagine superintelligences with somewhat arbitrary goals (cf Bostrom’s orthogonality or Yudkowskian paperclip maximisers). We might be more optimistic that AI persons are, by virtue of their nature, wiser and friendlier than the superintelligent agent. It may therefore be better for us not to control beings much better than us --- we might trust them to do good unconstrained by our values. Of course, we don’t want to anthropomorphise AI systems, and a theory of AI persons should take seriously the difference between AIs and humans in addition to the similarities.

Less ethical because to control such beings, by coercive technologies or design of their minds, seems more akin to slavery or repression than to a neutral technical problem.

Furthermore, attempts at unethical control may be counterproductive for x-safety, in so far as treating AI systems unfairly gives them reasons to disempower us. Unjust repression may lead to revolution.

Acknowledgements. Thanks to Matt MacDermott for suggesting that I post this, and to Matt, Robert Craven, Owain Evans, Paul Colognese, Teun Van Der Weij, Korbinian Friedl, and Rohan Subramani for feedback on the paper.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签