少点错误 02月02日
Gradual Disempowerment, Shell Games and Flinches
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人类在AI发展过程中逐渐丧失权力的风险。核心观点是,当人类不再是社会系统(如经济、国家、文化)的关键要素时,这些系统将不再与人类利益保持一致,导致人类被边缘化。文章分析了人们在面对这一观点时常见的逃避模式,如将责任推给其他社会系统、忽略其长期影响、或寄希望于未来的AI来解决。作者认为,既有机构和研究方向可能难以有效应对这一挑战,因为它们更关注技术风险而非宏观社会变革。

⚙️ “推卸责任”:当指出自动化可能削弱人类经济力量时,人们倾向于寄希望于国家进行再分配;当指出国家可能不再响应人类需求时,又寄希望于文化价值观和民主制度。这种“推卸责任”的模式忽略了同一根本原因(对人类依赖减少)如何同时影响所有系统。

😨 “认知退缩”:即使是聪明人,在面对人类逐渐被AI取代的观点时,也会出现认知上的退缩,将注意力转移到更熟悉的AI风险形式上,例如专注于经济模型的细节,而不是考虑其对国家或文化演变的影响。他们会倾向于将此问题归类为其他已知的故事。

🤖 “委托未来AI”:一种常见的反应是认为未来对齐的AI将会解决这个问题,或者人类注定要失败。这种观点忽略了这样一个事实:在拥有超智能AI之前,创建AI的机构已经被较弱的AI渗透,改变了激励机制,因此,超智能AI的“对齐”可能并不符合人类的根本利益。

🏢 “机构惯性”:人工智能实验室的科研人员可能并不乐于深入研究人类逐渐被取代的论点,因为这些机构是基于对AGI的技术风险的担忧而成立的,并且他们的“制胜计划”是在这种担忧下制定的,而没有考虑宏观经济的影响。这种机构惯性也存在于更广泛的人工智能安全社区中,学术激励机制也未能支持对这一问题的研究。

Published on February 2, 2025 2:47 PM GMT

Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.) This post is not about repeating that argument - it might be quite helpful to read the paper first, it has more nuance and more than just the central claim - but mostly me ranting sharing some parts of the experience of working on this and discussing this.

What fascinates me isn't just the substance of these conversations, but relatively consistent patterns in how people avoid engaging with the core argument. I don't mean the cases where stochastic parrots people confused about AI progress repeat claims about what AIs can't do that were experimentally refuted half a year ago, but the cases where smart, thoughtful people who can engage with other arguments about existential risk from AI display surprisingly consistent barriers when confronting this particular scenario.

I found this frustrating, but over time, I began to see these reactions as interesting data points in themselves. In this post, I'll try to make explicit several patterns I've observed. This isn't meant as criticism. Rather, I hope that by making these patterns visible, we can better understand the epistemics of the space.

Before diving in, I should note that this is a subjective account, based on my personal observations and interpretations. It's not something agreed on or shared with the paper coauthors, although when we compared notes on this, we sometimes found surprisingly similar patterns. Think of this as one observer's attempt to make legible some consistently recurring dynamics. Let's start with what I call "shell games", after an excellent post by TsviBT.

Shell Games

The core principle of the shell games in alignment is that when people propose strategies for alignment, the hard part of aligning superintelligence is always happening in some other component of the system than what's analyzed. In gradual disempowerment scenarios, the shell game manifests as shifting the burden of maintaining human influence between different societal systems.

When you point out how automation might severely reduce human economic power, people often respond "but the state will handle redistribution." When you explain how states might become less responsive to human needs as they rely less on human labor and taxes, they suggest "but cultural values and democratic institutions will prevent that." When you point out how cultural evolution might drift memplexes away from human interests when human minds stop being the key substrate, maybe this has an economic solution or governance solution. 

What makes this particularly seductive is that each individual response is reasonable. Yes, states can regulate economies. Yes, culture can influence states. Yes, economic power can shape culture. The shell game exploits the tendency to think about these systems in isolation, missing how the same underlying dynamic - decreased reliance on humans - affects all of them simultaneously, and how shifting the burden puts more strain on the system which ultimately has to keep humans in power. 

I've found this pattern particularly common among people who work on one of the individual domains. Their framework gives them sophisticated tools for thinking about how one social system works, but usually the gradual disempowerment dynamic undermines some of the assumptions they start from, if multiple systems might fail in correlated ways.

The Flinch

Another interesting pattern in how people sometimes encounter the gradual disempowerment argument is a kind of cognitive flinch away from really engaging with it. It's not disagreement exactly; it's more like their attention suddenly slides elsewhere, often to more “comfortable”, familiar forms of AI risk. 

This happens even with (maybe especially with) very smart people who are perfectly capable of understanding the argument. A researcher might nod along as we discuss how AI could reduce human economic relevance, but bounce off the implications for state or cultural evolution. Instead, they may want to focus on technical details of the econ model, how likely it is that machines will outcompete humans in virtually all tasks including massages or something like that.

Another flinch is something like just rounding it off to some other well known story - like “oh, you are discussing multipolar scenario” or "so you are retelling Paul's story about influence-seeking patterns." (Because the top comment on LessWrong is a bit like that, it is probably worth noting that while it fits the pattern, it is not the single or strongest piece of evidence.)

Delegating to Future AI

Another response, particularly from alignment researchers, is "This isn't really a top problem we need to worry about now - either future aligned AIs will solve it or we are doomed anyway."

This invites a rather unhelpful reaction of the type "Well, so the suggestion is we keep humans in control by humans doing exactly what the AIs tell them to do, and this way human power and autonomy is preserved?". But this is a strawman and there's something deeper here - maybe it really is just another problem, solvable by better cognition.

I think this is where the 'gradual' assumption is important. How did you get to the state of having superhuman intelligence aligned to you? If the current trajectory continues, it's not the case that the AI you have is a faithful representative of you, personally, run in your garage. Rather it seems there is a complex socio-economic process leading to the creation of the AIs, and the smarter they are, the more likely it is they were created by a powerful company or a government.

This process itself shapes what the AIs are "aligned" to. Even if we solve some parts of the technical alignment problem we still face the question of what is the sociotechnical process acting as “principal”. By the time we have superintelligent AI, the institutions creating them will have already been permeated by weaker AIs decreasing human relevance and changing the incentive landscape. 

The idea that the principal is you, personally, implies that a somewhat radical restructuring of society somehow happened before you got such AI and that individuals gained a lot of power currently held by super-human entities like bureaucracies, states or corporations. 

Also yes: it is true that capability jumps can lead to much sharper left turns. I think that risk is real and unacceptably high. I can easily agree that gradual disempowerment is most relevant in words where rapid loss of control does not happen first, but note that the gradual problem makes the risk of coups go up. There is actually substantial debate here I'm excited about.

Local Incentives

Let me get a bit more concrete and personal here. If you are a researcher at a frontier AI lab, I think it's not in your institution's self-interest for you to engage too deeply with gradual disempowerment arguments. The institutions were founded based on worries about power and technical risks of AGI, not worries about AI and macroeconomy. They have some influence over technical development, and their 'how we win' plans were mostly crafted in a period of time where it seemed this was sufficient. It is very unclear if they are helpful or have much leverage in the gradual disempowerment trajectories. 

To give a concrete example, in my read of Dario Amodei's "Machines of Loving Grace" one of the more important things to notice is not what is there, like fairly detailed analysis of progress in biology, but what is not there, or is extremely vague. I appreciate it is at least gestured at:

At that point (...a little past the point where we reach "a country of geniuses in a datacenter"...) our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized.

So, we will have nice, specific things like Prevention of Alzheimer's, or some safer, more reliable descendant of CRISPR may cure most genetic disease in existing people. Also, we will need to have some conversation because the human economy will be obsolete and incentives for states to care about people will be obsolete.

I love that it is a positive vision. Also, IDK, it seems like a kind of forced optimism about certain parts of the future. Yes, we can acknowledge specific technical challenges. Yes, we can worry about deceptive alignment or capability jumps. But questioning where the whole enterprise ends, even if everything works as intended? Seems harder to incorporate into institutional narratives and strategies.

Even for those not directly employed by AI labs, there are similar dynamics in the broader AI safety community. Careers, research funding, and professional networks are increasingly built around certain ways of thinking about AI risk.  Gradual disempowerment doesn't fit neatly into these frameworks. It suggests we need different kinds of expertise and different approaches than what many have invested years developing. Academic incentives also currently do not point here - there are likely less than ten economists taking this seriously, trans-disciplinary nature of the problem makes it hard sell as a grant proposal.     

To be clear this isn't about individual researchers making bad choices. It's about how institutional contexts shape what kinds of problems feel important or tractable, how funding landscape shapes what people work on, how memeplexes or ‘schools of thought’ shape attention.  In a way, this itself illustrates some of the points about gradual disempowerment - how systems can shape human behavior and cognition in ways that reinforce their own trajectory.

Conclusion

Actually, I don't know what's really going on here. Mostly, in my life, I've seen a bunch of case studies of epistemic distortion fields - cases where incentives like money or power shape what people have trouble thinking about, or where memeplexes protect themselves from threatening ideas. The flinching moves I've described look somewhat familiar to those patterns.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI风险 人类权力 社会变革 机构惯性 认知偏差
相关文章