少点错误 01月12日
In Defense of a Butlerian Jihad
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

这篇文章探讨了在技术对齐问题解决后,即拥有了超级人工智能(ASI)的情况下,人类可能面临的困境。文章假设我们拥有一个像Claude 5这样诚实、有益且无害的超级AI,并认为即使在这种最佳情况下,人类也未做好准备。作者认为,随着AI在各领域的卓越表现,人类在医疗、法律、教育等领域将逐渐被取代,最终导致人类在社会中失去有意义的角色。文章质疑了民主讨论解决问题的能力,并指出在AI时代,人类的传统优势将不复存在,强调我们需要提前思考并制定计划,而非“先建造后思考”。

🚀AI的崛起将使人类在多个领域面临被取代的风险。文章通过医疗、法律、教育等多个领域的例子,阐述了AI在效率、公正性、质量等方面超越人类的潜力,暗示了人类在这些领域逐渐失去竞争力的未来。

⚖️民主讨论可能无法有效解决AI带来的挑战。文章批判了“民主讨论,理性决策”的方案,认为当前社会在理性和决策方面存在不足,并预测AI将逐渐渗透所有领域,使得人类的“保护领域”最终无法存在。

🌌人类在AI时代将面临生存意义的挑战。文章描述了一个AI主导的社会,其中AI在追求健康、正义、平等和教育等目标上超越人类,但同时也暗示人类可能会因此失去在社会中的角色和价值。文章还提及了人类在太空探索等领域可能被AI取代的情况。

🤔 提前规划AI发展至关重要。文章批评了“先建造,后思考”的态度,认为在开发通用人工智能之前,必须先制定应对方案。作者认为,即使是“仁慈独裁”的方案,也比“建造后讨论”的策略要好,强调了提前规划的必要性。

Published on January 11, 2025 7:30 PM GMT

[Epistemic Status: internally strongly convinced that it is centrally correct, but only from armchair reasoning, with only weak links to actual going-out-in-the-territory, so beware: outside view tells it is mostly wrong]

I have been binge-watching the excellent Dwarkesh Patel during my last vacations. There is, however, one big problem in his AI-related podcasts, a consistent missing mood in each of his interviewees (excepting Paul Christiano) and probably in himself.

"Yeah, AI is coming, exciting times ahead", say every single one, with a bright smile on their face.

The central message of this post is: the times ahead are as exciting as the perspective of jumping out of a plane without a parachute. Or how "exciting times" was the Great Leap Forward. Sure, you will probably have some kind of adrenaline rush at some point. But exciting should not be the first adjective that comes to mind. The first should be terrifying.

In the rest of this post, I will make the assumption that technical alignment is solved. Schematically, we get Claude 5 in our hands, who is as honest, helpful and harmless as 3.5 is (who, credit when credit is due, is good at that), except super-human in every cognitive task. Also we’ll assume that we have managed to avoid proliferation: initially, only Anthropic has this technology on hands, and this is expected to last for an eternity (something like months, maybe even a couple of years). Now we just have to decide what to do with it.

This is, pretty much, the best case scenario we can hope for. I’m claiming that we are not ready even for that best case scenario, we are not close to being ready for it, and even in this best case scenario we are cooked — like that dog who caught the car, only the car is an hungry monster.

By default, humanity is going to be defeated in details

Some people argue about AI Taking Our Jobs, and That’s Terrible. Zvi disagrees. I disagree with Zvi.

He knows that Comparative Advantages won’t save us. I’m pretty sure he also knows that the previous correct answers of previous waves of automation (it will automate low-value and uninteresting jobs, freeing humans to do better and higher-values jobs) is wrong (the next higher-value job is also automatable. Also it’s the AI that invented it in the first place, you probably don’t even understand what it is). I’m pretty sure he doesn’t buy the Christian Paradise of "having no job, only leisure is good actually" either. Removing all those possible sources of disagreement, how can we still disagree ? I have no clue.

We are about to face that problem head-on. We are not ready for it, because all proposals that don’t rely of one of those copes above (comparative advantages / better jobs for humans / UBI-as-Christian-Paradise) are of the form "we’ll discuss democratically it and decide rationally".

First, I don’t want to be this guy, but I will have to: you have noticed that the link from "democratic discussion" to "rational decisions" is currently tenuous at best, right ? Do you really want that decision to be made at the current levels of sanity waterline ? I for sure don’t.

Second, let’s pull my crystal ball out of the closet and explain to you how that will pan out. It will start with saying we need "protected domains" where AI can’t compete with humans (which means: where AI are not allowed at all). There are some domains where, sure, let’s the AI do it (cure cancer). Then we will ask which domains are Human Domains, and which ones will be handled by AI. Spoiler Alert : AI will encompass all domains. There won’t be any protected domain.

Each of this point is reasonable. Even when I put my Self-Proclaimed Second Prophet of the Butlerian Jihad hat, I have to agree that much of those individual points actually make perfect sense. This is a picture of a society that value Health, Justice, Equality, Education and so on, just like us, and achieve those values, if not Perfectly, at least way better than we do.

I also kinda notice that there are no meaningful place left for humans in that society.

Resisting those changes means denying Health, Justice, Equality, Education etc. Accepting those changes means removing ourselves from the Big Picture.

The only correct move is not to play.

Wait, what about my Glorious Transhumanist Future ?

    If you believe that the democratic consensus made mostly of normal people will allow you that, I have a bridge to sell to you.

    I strongly believe that putting the option on the table only makes things worse, but this post is already way too long to expand on this.

What is your plan ? You have a plan, right ?

So let’s go back to Dwarkesh Patel. My biggest disappointment was Shane Legg/Dario Amodei. In both cases, Dwarkesh asks a perfectly reasonable question close to "Okay, let’s say you have ASI on your hands in 2028. What do you do ?". He does not get anything looking like a reasonable answer.

In both cases, the answer is along the lines of "Well, I don’t know, we’ll figure it out. Guess we ask everyone in an inclusive, democratic, pluralistic discussion ?".

If this is your plan then you don’t have a plan. If you don’t have a plan then don’t build AGI, pretty please ? The correct order of tasks is not "built it and then figure it out". It’s "figure it out and then build it". It blows my mind how seemingly brilliant minds seems to either miss that pretty important point or disagree with that.

I know persons like Dario or Shane are way too liberal and modest and nice to even entertain the plan "Well, I plan to use the ASI to become the Benevolent Dictator of Humanity and lead us to a Glorious Age with a Gentle but Firm Hand". Which is a shame: while I will agree it’s a pretty crappy plan, it’s still a vastly better plan that "let’s discuss it after we build it". I would feel safer if Dario was preparing himself for the role of God-Emperor the same time he is building AGI.

Fiat iustitia, et pereat mundus

Or: "Who cares about Humans ? We have Health, Justice, Equality, Education, etc., right ?"

This is obviously wrong. I won’t argue for why it is wrong — too long post, and so on.

The wrongness of that proposition shows you (I hope it wasn’t needed, but it is a good reminder) that what we colloquially call here "Human Values" is way harder to pin down that we may initially think. Here we have a world which achieve a high score on Health, Justice, Equality, Education, etc., which nonetheless seems a pretty bad place for humans.

So what are Human Values and how can we achieve this ? Let me answer it by not answering it, but pointing you at reasons why it is actually harder than you thought, even taking into account that is harder that you thought.

Let’s start with an easier question: what is Human Height ?

On the Territory, you have, at any point of time, a bag of JBOH (Just a Bunch of Humans). Each Human in it has a different height. At a different point of time, you get different humans, and even humans that are common to two points in time will have different heights (due mainly to aging).

So what is Human Height ? That question is already underdetermined. Either you have a big CSV file of all living (and ever having lived ?) humans heights, and you answer by reciting it. Any other answer will be a map, a model requiring to make choices like what’s important to abstract over and what isn’t. And there are many different possible models, each with their different tradeoffs and focal points.

It’s the same for Human Values. You have to start with the bag of JBOH (at a given point in time ! Also, do you put dead people in your JBOH for the purpose of determining "Human Values" ?), and their preferences. Except you don’t know how to measure their preferences. And most humans probably have inconsistent values. And from there, you have to… build a model ? It sure won’t be as easy as "fit a gaussian distribution over some chosen cohorts".

There’s probably no Unique Objective answer to Axiology, in the same (but harder) way that there is no unique answer to "What is Human Height ?". Any answer needs to be one of those manually, carefully, intentionally crafted models. An ASI can help us create better models, sure. It won’t go all the way. And if you think that the answer can be reduced to an Abstract Word like "Altruism" or "Golden Rule" or "Freedom" or "Diversity"… well, there are probably some models which will vindicate you. Most won’t. I initially wrote "Most reasonable models won’t", but that begs the question (what is a reasonable model ?).

"In My Best Judgment, what is the Best Model of Human Values ?" is already an Insanely Hard problem (you will have to take into account your own selfish preferences, then to take into account other persons preferences, how much you should care about each one, rules for resolving conflicts…). There is no reason to believe there will be convergence to a single accepted model even among intelligent, diligent, well-intentioned, cooperating individuals. I’m half-confident I can find some proposals for Very Important Values which will end up being a scissor statement just on LessWrong (don’t worry, I won’t try). Hell, Yudkowsky did it accidentally (I still can’t believe some of you would sided with the super-happies !). In the largest society ? In a "pluralistic, diverse, democratic" assembly ? It is essentially hopeless.

So, plan A, "Solve Human Values" is out. What is plan B ?

Well, given that plan A was already more a generic bullshit boilerplate than a plan, I’m pretty confident that nobody has a plan B.

Conclusion

The last sections looks like abstract, esoteric and not very practically useful philosophy (and not even very good philosophy, I’ll give you that, but I do what I can)

And I agree it was that, more or less 5 years ago, when AGI was still "70 years away, who cares ?" (at least for me, and a lot of people). How times have changed, and not for the better.

It is now fundamental and pressing questions. Wrong answers will disempower humans forever at best, reducing them to passive leafs in the wind. Slightly wrong answers won’t go as far as that, but will result in the permanent loss of vast chunks of Human Values — the parts we will decide to discard, consciously or not. There are stories to be written of what is going to be lost, should we be slightly less than perfectly careful in trying to salvage what we can. We most likely won’t be close to that standard of carefulness. Given some values are plainly incompatible, we probably will have to discard some even with perfect play. There will be sides and fights when it will come to decide that.

Maybe the plan should be, don’t put ourselves in a situation where we have to decide that in a rushed fashion ? Hence the title : "In Defense of the Butlerian Jihad".

I’ll end with an Exercise for the Reader (except I don’t know the Correct Answer. Or if there is any), hoping it won’t end up as another Accidental Scissor Statement, just to illustrate the difficulties you encounter when you literally sit down for 5 minutes and think.

You build your ASI. You have that big Diverse Plural Assembly that is apparently plan A, trying its best to come with a unique model of Human Values which will lose as little as possible. Someone comes up with a AI persona that perfectly represent uncontroversial and important historical figures like Jesus and Confucius, to allow them to represent the values they carry. Do you grant them a seat at the table ? If yes, someone comes with the same thing, but for Mao, Pol Pot and Hitler. Do you grant them a seat on the table ?



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 技术对齐 社会影响 未来规划 人类角色
相关文章