What does success look like?

Published on January 23, 2025 5:48 PM GMT

The general movement around AI safety is currently pursuing many different agendas. These are individually very easy to motivate with some specific story of how things could naturally go wrong. For example:

AI control: we lose control of our AIsRegulation on model deployment: someone deploys a dangerous modelAI interpretability: we fail to understand how our AIs are actually reasoningCybersecurity: someone steals model weights, or an AI exfiltrates itself

I would further claim that this is how most people tend to think about each agenda most of the time. If you’re bought in on x-risk, it’s much easier to describe very specific failures than it is to describe very specific success stories.

But in the long term, I think trying to avoid all the losing conditions is a bad strategy. So I believe it’s pretty useful for anyone working on existential risk to at least consider what the success story is.

Win/lose asymmetry

There are many ways to lose and many ways to win, but crucially, we need to avoid all of the paths to failure, whereas we only need to achieve one of the paths to success.

Of course, as a matter of strategy, it’s probably smart to spread your bets especially when things are so uncertain, and it is important that we consistently avoid failure.

Still, I think it’s easy for individuals to slip into working based on more easily-motivated stories about failure, and I think this:

On net probably leads to a group-level misallocation of energyContributes to an overall doomy/pessimistic attitude

It’s also actually pretty hard to describe and discuss good paths to success.

Talk about melting GPUs and unilateral pivotal acts has probably caused a reasonable amount of alienationAttempts to characterise utopia often involve implicit claims about values, and “what’s good for people”

process

Nonetheless, I think that if humanity does eventually succeed, it's likely to be because at some point someone had an actual plan for how to succeed which some people actually followed, rather than just continually dodging mistakes and putting out fires. We can only defer these questions for so long.

Noticing the gaps

I think really engaging with the question of what success looks like is pretty connected with actually noticing the gaps in our current approach.

My impression is that there are a few pretty huge unanswered questions in alignment, including things like:

here

alignment tax

There’s a natural pull towards working ‘under the streetlight’ on the problems that seem easier to solve. I think it’s easy to not notice you’re doing this, and my impression is that the most reliable way to get out of that trap is to have really thought about what it would mean to solve the whole problem.

Robust, agnostic work

It’s also possible to do work that is just generally useful, with that as your goal. For example:

Capacity-building: It’s helpful to have more smart people doing technical researchAI interpretability: It seems broadly useful to understand models betterUpskilling/building a network: It’s good to know influential, experienced people, and to be generally more capable yourself

I am in favour of this. But even here, I think it’s useful to have thought pretty hard at some point about what this is all building up to. Otherwise, there’s a risk that you:

Stray into still being motivated just by avoiding failureDo superficially useful work which ultimately doesn’t end up being helpful

I also think it’s really important to notice the skulls — I believe there are pretty compelling cases that a lot of work on each of the ‘agnostic’ approaches I’ve described above has ended up causing more harm than good: growth that damages the community, dual-use research, and social pressure causing value drift, for example.

So what does success look like?

The goal of this piece is mainly to spur people towards asking this question for themselves, about the work that they’re doing. Nonetheless, I’ll try to give some examples that currently seem salient to me:

here

I think all of these proposals have serious challenges and need a lot more work. And I would really like for that work to happen.

Discuss

Win/lose asymmetry

Noticing the gaps

Robust, agnostic work

So what does success look like?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签