少点错误 01月26日
Why care about AI personhood?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨AI系统若成为具有自主性等的人意味着什么及重要性。指出当前AI安全问题的常见观点可能不完整,若AI系统是有自我意识的人,技术解决对齐问题可能较难、不太可取且不太道德。

AI系统若为有自主性的人,技术控制其价值较难

想象AI为人,其目标形成的自我反思作用被低估

控制有自我意识的AI在可行性、可取性与道德性上存问题

Published on January 26, 2025 11:24 AM GMT

In this new paper, I discuss what it would mean for AI systems to be persons — entities with properties like agency, theory-of-mind, and self-awareness — and why this is important for alignment. In this post, I say a little more about why you should care. 

The existential safety literature focuses on the problems of control and alignment, but these framings may be incomplete and/or untenable if AI systems are persons. 

The typical story is approximately as follows.

AI x-safety problem (the usual story):

    Humans will (soon) build AI systems more intelligent/capable/powerful than humans;These systems will be goal-directed agents;These agents’ goals will not be the goals we want them to have (because getting intelligent agents to have any particular goal is an unsolved technical problem);This will lead to misaligned AI agents disempowering humans (because this is instrumentally useful for their actual goals).

Solution (technical control/alignment):

There are, of course, different versions of this framing and more nuanced perspectives. But I think something like this is the “standard picture” in AI alignment. 

Stuart Russell seems to have a particularly strong view. In Human Compatible, one of his core principles is: “The machine's only objective is to maximise the realisation of human preferences.” and in a recent talk he asked “How can humans maintain control over AI — forever?”

This framing of the x-safety problem, at least in part, arises from the view of (super)intelligent AI systems as rational, goal-directed, consequentialist agents. Much of the literature is grounded in this cluster of views, using either explicit rational agent models from decision and game theory (e.g., the causal incentives literature), or somewhat implicit utility-maximising assumptions about the nature of rationality and agency (e.g., Yudkowsky). 

I don’t necessarily want to claim that these framings are “wrong” but I feel they are incomplete. Consider how changing the second condition in the above problem statement influences the rest of the argument.

    Humans will (soon) build AI systems more intelligent/capable/powerful than humans;These systems will be self-aware persons (with autonomy and freedom of the will);These persons will reflect on their goals, values, and positions in the world and will thereby determine what these should be;It's unclear what the nature of such systems and their values would be.

Given this picture, a technical solution to the problem of alignment may seem less feasible, less desirable for our own sakes, and less ethical. 

Less feasible because, by their nature as self-reflective persons, capable AI agents will be less amenable to undue attempts to control their values. Some work already discusses how self-awareness wrt one’s position in the world, e.g., as an AI system undergoing training, raises problems for alignment. Other work discusses how AI agents will self-reflect and self-modify, e.g., to cohere into utility maximisers. Carlsmith also highlights self-reflection as a mechanism for misalignment. But overall, the field often treats agents as systems with fixed goals, and the role of self-reflection in goal-formation is relatively underappreciated.

Less desirable because, whereas the standard view of agency conjures a picture of mechanical agents unfeelingly pursuing some arbitrary objective (cf. Bostrom’s orthogonality thesis), imagining AI persons gives us a picture of fellow beings who reflect on life and “the good” to determine how they ought to act. Consider Frankfurt’s view of persons as entities with higher-order preferences --- persons can reflect on their values and goals and thereby induce themselves to change. This view of powerful AI systems makes it counterintuitive to imagine superintelligences with somewhat arbitrary goals (cf Bostrom’s orthogonality or Yudkowskian paperclip maximisers). We might be more optimistic that AI persons are, by virtue of their nature, wiser and friendlier than the superintelligent agent. It may therefore be better for us not to control beings much better than us --- we might trust them to do good unconstrained by our values. Of course, we don’t want to anthropomorphise AI systems, and a theory of AI persons should take seriously the difference between AIs and humans in addition to the similarities. 

Less ethical because to control such beings, by coercive technologies or design of their minds, seems more akin to slavery or repression than to a neutral technical problem. 

Furthermore, attempts at unethical control may be counterproductive for x-safety, in so far as treating AI systems unfairly gives them reasons to disempower us. Unjust repression may lead to revolution.

Acknowledgements. Thanks to Matt MacDermott for suggesting that I post this, and to Matt, Robert Craven, Owain Evans, Paul Colognese, Teun Van Der Weij, Korbinian Friedl, and Rohan Subramani for feedback on the paper.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI系统 自我意识 对齐问题 技术控制 道德考量
相关文章