Published on January 11, 2025 7:30 PM GMT

[Epistemic Status: internally strongly convinced that it is centrally correct, but only from armchair reasoning, with only weak links to actual going-out-in-the-territory, so beware: outside view tells it is mostly wrong]

I have been binge-watching the excellent Dwarkesh Patel during my last vacations. There is, however, one big problem in his AI-related podcasts, a consistent missing mood in each of his interviewees (excepting Paul Christiano) and probably in himself.

"Yeah, AI is coming, exciting times ahead", say every single one, with a bright smile on their face.

The central message of this post is: the times ahead are as exciting as the perspective of jumping out of a plane without a parachute. Or how "exciting times" was the Great Leap Forward. Sure, you will probably have some kind of adrenaline rush at some point. But exciting should not be the first adjective that comes to mind. The first should be terrifying.

In the rest of this post, I will make the assumption that technical alignment is solved. Schematically, we get Claude 5 in our hands, who is as honest, helpful and harmless as 3.5 is (who, credit when credit is due, is good at that), except super-human in every cognitive task. Also we’ll assume that we have managed to avoid proliferation: initially, only Anthropic has this technology on hands, and this is expected to last for an eternity (something like months, maybe even a couple of years). Now we just have to decide what to do with it.

This is, pretty much, the best case scenario we can hope for. I’m claiming that we are not ready even for that best case scenario, we are not close to being ready for it, and even in this best case scenario we are cooked — like that dog who caught the car, only the car is an hungry monster.

By default, humanity is going to be defeated in details

Some people argue about AI Taking Our Jobs, and That’s Terrible. Zvi disagrees. I disagree with Zvi.

He knows that Comparative Advantages won’t save us. I’m pretty sure he also knows that the previous correct answers of previous waves of automation (it will automate low-value and uninteresting jobs, freeing humans to do better and higher-values jobs) is wrong (the next higher-value job is also automatable. Also it’s the AI that invented it in the first place, you probably don’t even understand what it is). I’m pretty sure he doesn’t buy the Christian Paradise of "having no job, only leisure is good actually" either. Removing all those possible sources of disagreement, how can we still disagree ? I have no clue.

We are about to face that problem head-on. We are not ready for it, because all proposals that don’t rely of one of those copes above (comparative advantages / better jobs for humans / UBI-as-Christian-Paradise) are of the form "we’ll discuss democratically it and decide rationally".

First, I don’t want to be this guy, but I will have to: you have noticed that the link from "democratic discussion" to "rational decisions" is currently tenuous at best, right ? Do you really want that decision to be made at the current levels of sanity waterline ? I for sure don’t.

Second, let’s pull my crystal ball out of the closet and explain to you how that will pan out. It will start with saying we need "protected domains" where AI can’t compete with humans (which means: where AI are not allowed at all). There are some domains where, sure, let’s the AI do it (cure cancer). Then we will ask which domains are Human Domains, and which ones will be handled by AI. Spoiler Alert : AI will encompass all domains. There won’t be any protected domain.

Do we want medicine to be a Protected Domain ? I mean, Bob’s over there has a passion for medicine. He would love to dedicate his life to it — it’s his calling. But compared to an AI, he’s a really crappy doctor (sorry Bob, but you know that’s true). His patients will have way worse outcomes. What do we privilege, the preference of doctors or the welfare of patients ? The question, it answers itself. Also, the difference in price between the Best AI Doctor and the Worse AI Doctor is probably less than the difference in price between the Best Human Doctor and the Worst Human Doctor, so AI is also better for equality, so everyone will be for it.

Alice the Lawyer will argue to you that justice Has To Stay Human. But let’s face it : AI lawyers are less prone to errors, less prone to bias, have more capacity to take into account precedents and changes in Law. Having Human Justice means having less Justice overall. It’s throwing under the bus some wrongly convicted innocents, and letting some monsters go free to do more harm. Also, difference in prices lesser for AI, so more equality before Justice, which is good. Also Alice, despite being a lawyer, is way less convincing than "Claude 5 for Law Beta" arguing that he should handle that job.

Catherine the Teacher argues that Education is fundamentally a social experience and has to be done by Humans. Every single point of evidence shows that AI-tutored students do better in all dimensions that their human-tutored peers. What is more important, educators preferences or quality of children education ? Well, when you put it that way…

David the Scientist argues that fundamental Scientist Research is not that directly impactful on specific humans, only in a diffuse and indirect way, and it’s one of the Proudest Achievement of Humanity and should be Reserved for Humans. "Clause 5 for Medicine" which has been deployed yesterday point out that he needed better biology and statistics, which needed better physics and mathematics, and that he has incidentally already solved all problems humans know of, and some more, the papers have been published on arXiv this morning, are you even reading your inbox, and what are you gonna do, pout and refuse to read them ?

Edward the Philosopher opens his mouth but before he utters a single word is interrupted by "Claude 5 Tutor" and "Claude 5 Justice" (deployed yesterday too) : well, we did the same for Philosophy and Morals and Ethics as part of our mission.

Frank the artist is silently thinking for himself: "They all threw me under the bus circa 2022, I’m not sure I can bring myself to feel sad for them."

Musk says "fuck you all, I just want to conquer space". His plan is to set up mining operations in the asteroid belt to finance a Mars Colony. An AI-founded and AI-run company does it faster, better, and it makes no economic sense to send humans in space to do some economically valuable work when silicon does it cheaper, better, and without having to spend valuable delta-v on 90 pounds of useless water per worker. "Claude Asteroids Mining Co" ushers a new age of material abundance on Earth. SpaceX goes bankrupt. No human ever set a foot on Mars.

Congress pass a Law that an AI cannot be a Representative. Then that Representatives cannot use AI for Policy, because this is the last Human Bastion. Then that Representatives have to be isolated for the internet to keep with the spirit of that Law. Congress then becomes a large bottleneck for objectively better governance, and is side-stepped as much as possible. The pattern repeat: Human-Only decision points are declared, then observed as strictly inferior and considered as an issue to be worked around. Ten years later, a wave of Neo-Democratic challengers remove the current incumbents based on a AI-created platform, whose central point is to remove the law prohibiting AIs to be representatives and to outlaw "human-only decision points" in the USG/World-Government.

Each of this point is reasonable. Even when I put my Self-Proclaimed Second Prophet of the Butlerian Jihad hat, I have to agree that much of those individual points actually make perfect sense. This is a picture of a society that value Health, Justice, Equality, Education and so on, just like us, and achieve those values, if not Perfectly, at least way better than we do.

I also kinda notice that there are no meaningful place left for humans in that society.

Resisting those changes means denying Health, Justice, Equality, Education etc. Accepting those changes means removing ourselves from the Big Picture.

The only correct move is not to play.

Wait, what about my Glorious Transhumanist Future ?

If you believe that the democratic consensus made mostly of normal people will allow you that, I have a bridge to sell to you.

I strongly believe that putting the option on the table only makes things worse, but this post is already way too long to expand on this.

What is your plan ? You have a plan, right ?

So let’s go back to Dwarkesh Patel. My biggest disappointment was Shane Legg/Dario Amodei. In both cases, Dwarkesh asks a perfectly reasonable question close to "Okay, let’s say you have ASI on your hands in 2028. What do you do ?". He does not get anything looking like a reasonable answer.

In both cases, the answer is along the lines of "Well, I don’t know, we’ll figure it out. Guess we ask everyone in an inclusive, democratic, pluralistic discussion ?".

If this is your plan then you don’t have a plan. If you don’t have a plan then don’t build AGI, pretty please ? The correct order of tasks is not "built it and then figure it out". It’s "figure it out and then build it". It blows my mind how seemingly brilliant minds seems to either miss that pretty important point or disagree with that.

I know persons like Dario or Shane are way too liberal and modest and nice to even entertain the plan "Well, I plan to use the ASI to become the Benevolent Dictator of Humanity and lead us to a Glorious Age with a Gentle but Firm Hand". Which is a shame: while I will agree it’s a pretty crappy plan, it’s still a vastly better plan that "let’s discuss it after we build it". I would feel safer if Dario was preparing himself for the role of God-Emperor the same time he is building AGI.

Fiat iustitia, et pereat mundus

Or: "Who cares about Humans ? We have Health, Justice, Equality, Education, etc., right ?"

This is obviously wrong. I won’t argue for why it is wrong — too long post, and so on.

The wrongness of that proposition shows you (I hope it wasn’t needed, but it is a good reminder) that what we colloquially call here "Human Values" is way harder to pin down that we may initially think. Here we have a world which achieve a high score on Health, Justice, Equality, Education, etc., which nonetheless seems a pretty bad place for humans.

So what are Human Values and how can we achieve this ? Let me answer it by not answering it, but pointing you at reasons why it is actually harder than you thought, even taking into account that is harder that you thought.

Let’s start with an easier question: what is Human Height ?

On the Territory, you have, at any point of time, a bag of JBOH (Just a Bunch of Humans). Each Human in it has a different height. At a different point of time, you get different humans, and even humans that are common to two points in time will have different heights (due mainly to aging).

So what is Human Height ? That question is already underdetermined. Either you have a big CSV file of all living (and ever having lived ?) humans heights, and you answer by reciting it. Any other answer will be a map, a model requiring to make choices like what’s important to abstract over and what isn’t. And there are many different possible models, each with their different tradeoffs and focal points.

It’s the same for Human Values. You have to start with the bag of JBOH (at a given point in time ! Also, do you put dead people in your JBOH for the purpose of determining "Human Values" ?), and their preferences. Except you don’t know how to measure their preferences. And most humans probably have inconsistent values. And from there, you have to… build a model ? It sure won’t be as easy as "fit a gaussian distribution over some chosen cohorts".

There’s probably no Unique Objective answer to Axiology, in the same (but harder) way that there is no unique answer to "What is Human Height ?". Any answer needs to be one of those manually, carefully, intentionally crafted models. An ASI can help us create better models, sure. It won’t go all the way. And if you think that the answer can be reduced to an Abstract Word like "Altruism" or "Golden Rule" or "Freedom" or "Diversity"… well, there are probably some models which will vindicate you. Most won’t. I initially wrote "Most reasonable models won’t", but that begs the question (what is a reasonable model ?).

"In My Best Judgment, what is the Best Model of Human Values ?" is already an Insanely Hard problem (you will have to take into account your own selfish preferences, then to take into account other persons preferences, how much you should care about each one, rules for resolving conflicts…). There is no reason to believe there will be convergence to a single accepted model even among intelligent, diligent, well-intentioned, cooperating individuals. I’m half-confident I can find some proposals for Very Important Values which will end up being a scissor statement just on LessWrong (don’t worry, I won’t try). Hell, Yudkowsky did it accidentally (I still can’t believe some of you would sided with the super-happies !). In the largest society ? In a "pluralistic, diverse, democratic" assembly ? It is essentially hopeless.

So, plan A, "Solve Human Values" is out. What is plan B ?

Well, given that plan A was already more a generic bullshit boilerplate than a plan, I’m pretty confident that nobody has a plan B.

Conclusion

The last sections looks like abstract, esoteric and not very practically useful philosophy (and not even very good philosophy, I’ll give you that, but I do what I can)

And I agree it was that, more or less 5 years ago, when AGI was still "70 years away, who cares ?" (at least for me, and a lot of people). How times have changed, and not for the better.

It is now fundamental and pressing questions. Wrong answers will disempower humans forever at best, reducing them to passive leafs in the wind. Slightly wrong answers won’t go as far as that, but will result in the permanent loss of vast chunks of Human Values — the parts we will decide to discard, consciously or not. There are stories to be written of what is going to be lost, should we be slightly less than perfectly careful in trying to salvage what we can. We most likely won’t be close to that standard of carefulness. Given some values are plainly incompatible, we probably will have to discard some even with perfect play. There will be sides and fights when it will come to decide that.

Maybe the plan should be, don’t put ourselves in a situation where we have to decide that in a rushed fashion ? Hence the title : "In Defense of the Butlerian Jihad".

I’ll end with an Exercise for the Reader (except I don’t know the Correct Answer. Or if there is any), hoping it won’t end up as another Accidental Scissor Statement, just to illustrate the difficulties you encounter when you literally sit down for 5 minutes and think.

You build your ASI. You have that big Diverse Plural Assembly that is apparently plan A, trying its best to come with a unique model of Human Values which will lose as little as possible. Someone comes up with a AI persona that perfectly represent uncontroversial and important historical figures like Jesus and Confucius, to allow them to represent the values they carry. Do you grant them a seat at the table ? If yes, someone comes with the same thing, but for Mao, Pol Pot and Hitler. Do you grant them a seat on the table ?

Discuss

By default, humanity is going to be defeated in details

Wait, what about my Glorious Transhumanist Future ?

What is your plan ? You have a plan, right ?

Fiat iustitia, et pereat mundus

Conclusion

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签