The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Published on February 21, 2025 8:15 PM GMT

First, let me quote my previous ancient post on the topic:

Effective Strategies for Changing Public Opinion
The titular paper is very relevant here. I'll summarize a few points.
The main two forms of intervention are persuasion and framing.Persuasion is, to wit, an attempt to change someone's set of beliefs, either by introducing new ones or by changing existing ones.Framing is a more subtle form: an attempt to change the relative weights of someone's beliefs, by empathizing different aspects of the situation, recontextualizing it.There's a dichotomy between the two. Persuasion is found to be very ineffective if used on someone with high domain knowledge. Framing-style arguments, on the other hand, are more effective the more the recipient knows about the topic.Thus, persuasion is better used on non-specialists, and it's most advantageous the first time it's used. If someone tries it and fails, they raise the recipient's domain knowledge, and the second persuasion attempt would be correspondingly hampered. Cached thoughts are also in effect.Framing, conversely, is better for specialists.

My sense is that, up to this point, AI risk advocacy targeted the following groups of people:

Advocacy methods: theory-based arguments, various proof-of-concept empirical evidence of misalignment, model organisms, et cetera.

parading around

Advocacy methods: viral LW/Xitter blog posts laying out AI X-risk arguments.

Persuasion

I think all of the above demographics aren't worth trying to persuade further at this point in time. It was very productive before, when they didn't yet have high domain knowledge related to AI Risk specifically, and there's been some major wins.

But further work in this space (and therefore work on all corresponding advocacy methods, yes) is likely to have ~no value.

~All ML researchers and academics that care have already made up their mind regarding whether they prefer to believe in misalignment risks or not. Additional scary papers and demos aren't going to make anyone budge.The relevant parts of the USG are mostly run by Musk and Vance nowadays, who have already decided either that they've found the solution to alignment (curiosity, or whatever Musk is spouting nowadays), or that AI safety is about wokeness. They're not going to change their minds. They're also going to stamp out any pockets of X-risk advocacy originating from within the government, so lower-level politicians are useless to talk to as well.Terminally online TPOT Xitters have already decided that it's about one of {US vs. China, open source vs. totalitarianism, wokeness vs. free speech, luddites vs. accelerationism}, and aren't going to change their mind in response to blog posts/expert opinions/cool papers.

Among those groups, we've already convinced ~everyone we were ever going to convince. That work was valuable and high-impact, but the remnants aren't going to budge in response to any evidence short of a megadeath AI catastrophe.^[1]

Hell, I am 100% behind the AI X-risk being real, and even I'm getting nauseated at how tone-deaf, irrelevant, and impotent the arguments for it sound nowadays, in the spaces in which we keep trying to make them.

A Better Target Demographic

Here's whom we actually should be trying to ~~convince~~ inform: normal people. The General Public.

distinct

tons

we don't know how AI works

does

they tend to have appropriate reactions to that information

If we can raise the awareness of the AGI Doom among the actual general public (again, not the small demographic of terminally online people), that will create significant political pressure on the USG, giving politicians an incentive to have platforms addressing the risks.

The only question is how to do that. I don't have a solid roadmap here. But it's not by writing viral LW/Xitter blog posts.

Some scattershot thoughts:

Comedians

Eliezer's Time article

can

all

saying

Some more on that

This one is kinda tricky, though.

@harfe

here

I am not sure how sensible this is, and also 2028 might be too late. But it'd be big if workable.

Overall, I expect that there's a ton of low-hanging high-impact fruits in this space, and even more high-impact clever interventions that are possible (in the vein of harfe's idea).

Extant Projects in This Space?

Some relevant ones I've heard about:

their Compendium

AI Notkilleveryoneism Memes

not

exclusionary

PauseAI

a signaling cascade

^[2]

first

then

lethal

Framing

Technically, I think there might be some hope for appealing to researchers/academics/politicians/the terminally online, by reframing the AI Risk concerns in terms they would like more.

All the talk about "safety" and "pauses" have led to us being easy to misinterpret as unambitious, technology-concerned, risk-averse luddites. That's of course incorrect. I, at least, am 100% onboard with enslaving god, becoming immortal, merging with the machines, eating the galaxies, perverting the natural order to usher in an unprecedented age of prosperity, forcing the wheels of time into reverse to bring the dead back to life, and all that good stuff. I am pretty sure most of us are like this (if perhaps not in those exact terms).

The only reason I/we are not accelerationists is because the current direction of AI progress is not, in fact, on the track to lead us to that glorious future. It's instead on the track to get us all killed like losers.

So a more effective communication posture might be to empathize this: frame the current AI paradigm as a low-status sucker's game, and suggest alternative avenues for grabbing power. Uploads, superbabies, adult intelligence enhancement, more transparent/Agent Foundations-y AI research, etc. Reframing "AI Safety" as being about high-fidelity AI Control might also be useful. (It's mostly about making AIs Do What You Mean, after all, and the best alignment work is almost always dual-use.)

If the current paradigm of AI capability advancement visibly stumbles in its acceleration^[3], this type of messaging would become even more effective. The black-box DL paradigm would open itself to derision for being a bubble, an empty promise.

I mention this reluctantly/for comprehensiveness' sake. I think that this is a high-variance approach, most of the attempts at this are going to land badly, and will amount to nothing or have a negative effect. But it is a possible option.

Messaging aimed at the general public is nevertheless a much better, and more neglected, avenue.

^{^}
Or maybe not even then, see the Law of Continued Failure.
^{^}
The toy model there is roughly:
(Source, Ctrl+F in the transcript for "second moving part is diverse threshold".)
^{^}
Which I do mostly expect. AGI does not seem just around the corner on my inside model of AI capabilities. The current roadmap seems to be "scale inference-time compute, build lots of RL environments, and hope that God will reward those acts of devotion by curing all LLM ailments and blessing them with generalization". Which might happen, DL is weird. But I think there's a lot of room for skepticism with that idea.
I think the position that The End Is Nigh is being deliberately oversold by powerful actors: the AGI Labs. It's in their corporate interests to signal hype to attract investment, regardless of how well research is actually progressing. So the mere fact that they're acting optimistic carries no information.
And those of us concerned about relevant X-risks are uniquely vulnerable to buying into that propaganda. Just with the extra step of transmuting the hype into despair. We're almost exactly the people this propaganda is optimized for, after all – and we're not immune to it.

Discuss

Persuasion

A Better Target Demographic

Extant Projects in This Space?

Framing

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签