少点错误 01月24日
What does success look like?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了AI安全运动的多个方面,包括易引发的问题、成功路径的思考、未解决的问题及一些有益的工作等。指出避免失败策略的不足,强调思考成功路径的重要性。

AI安全运动存在多种可能的失败情况,如AI控制、模型部署监管等问题。

长期来看避免所有失败条件是坏策略,应思考成功故事。

探讨了在对齐方面的几个未解决问题,如AI与价值的对齐等。

提到一些普遍有用的工作,如能力建设、AI可解释性等,但需明确其最终目标。

Published on January 23, 2025 5:48 PM GMT

The general movement around AI safety is currently pursuing many different agendas. These are individually very easy to motivate with some specific story of how things could naturally go wrong. For example:

I would further claim that this is how most people tend to think about each agenda most of the time. If you’re bought in on x-risk, it’s much easier to describe very specific failures than it is to describe very specific success stories.

But in the long term, I think trying to avoid all the losing conditions is a bad strategy. So I believe it’s pretty useful for anyone working on existential risk to at least consider what the success story is.

Win/lose asymmetry

There are many ways to lose and many ways to win, but crucially, we need to avoid all of the paths to failure, whereas we only need to achieve one of the paths to success. 

Of course, as a matter of strategy, it’s probably smart to spread your bets especially when things are so uncertain, and it is important that we consistently avoid failure.

Still, I think it’s easy for individuals to slip into working based on more easily-motivated stories about failure, and I think this:

It’s also actually pretty hard to describe and discuss good paths to success.

Nonetheless, I think that if humanity does eventually succeed, it's likely to be because at some point someone had an actual plan for how to succeed which some people actually followed, rather than just continually dodging mistakes and putting out fires. We can only defer these questions for so long.

Noticing the gaps

I think really engaging with the question of what success looks like is pretty connected with actually noticing the gaps in our current approach.

My impression is that there are a few pretty huge unanswered questions in alignment, including things like:

There’s a natural pull towards working ‘under the streetlight’ on the problems that seem easier to solve. I think it’s easy to not notice you’re doing this, and my impression is that the most reliable way to get out of that trap is to have really thought about what it would mean to solve the whole problem. 

Robust, agnostic work

It’s also possible to do work that is just generally useful, with that as your goal. For example:

I am in favour of this. But even here, I think it’s useful to have thought pretty hard at some point about what this is all building up to. Otherwise, there’s a risk that you:

I also think it’s really important to notice the skulls — I believe there are pretty compelling cases that a lot of work on each of the ‘agnostic’ approaches I’ve described above has ended up causing more harm than good: growth that damages the community, dual-use research, and social pressure causing value drift, for example.

So what does success look like?

The goal of this piece is mainly to spur people towards asking this question for themselves, about the work that they’re doing. Nonetheless, I’ll try to give some examples that currently seem salient to me:

I think all of these proposals have serious challenges and need a lot more work. And I would really like for that work to happen.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 成功路径 未解决问题 有益工作
相关文章