少点错误 2024年10月23日
AI safety tax dynamics
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能(AI)研究自动化加速科技进步与AI系统失控风险之间的互动关系。作者认为,AI研究的自动化可能会影响更强大系统的出现环境,因此,优先提升有益应用的开发可能是一种提高安全性的高杠杆策略。作者认为,AI安全税的峰值可能出现在AI处于轻度到中度超智能阶段,因为此时AI的自动化能力可能远远超过安全研究,而这种能力差距可能导致风险的急剧上升。

💥 **AI研究自动化对安全的影响:** AI研究自动化会改变更强大AI系统出现的环境,因此,优先提升有益应用的开发可能是一种提高安全性的高杠杆策略。作者认为,AI安全税的峰值可能出现在AI处于轻度到中度超智能阶段,因为此时AI的自动化能力可能远远超过安全研究,而这种能力差距可能导致风险的急剧上升。

🤖 **AI安全税的动态性:** 与其他技术不同,AI安全是一个动态问题。随着AI能力的提升,AI本身很可能会自动化大部分AI能力和安全研究。这意味着,有效能力工作的数量将取决于对能力工作的投资以及自动化能力工作的效率。同样,有效安全工作的数量不仅取决于对安全工作的投资,还取决于自动化的效率。

⚠️ **AI安全税峰值:** 作者认为,早期的人工通用智能(AGI)安全税较低,因为早期AGI的能力有限,难以对人类构成威胁。但随着AI能力的提升,安全税可能会上升,并在AI处于轻度到中度超智能阶段达到峰值。这是因为在这个阶段,AI的自动化能力可能远远超过安全研究,而这种能力差距可能导致风险的急剧上升。当AI达到强超智能阶段,安全税可能会下降,因为此时可能已经发展出有效的AI监管和对齐技术。

🚀 **超智能AI的风险:** 轻度到中度超智能系统如果未被良好对齐,会带来巨大风险。因此,需要在实践中发展如何对齐AI系统的技术。尽管这些技术可能会被新的AI范式所取代,但一些元级实践可能会保持相关性。

📈 **能力研究与安全研究的自动化差距:** 在AI发展过程中,能力研究的自动化可能会比安全研究的自动化更快。这是因为能力研究更容易设计高质量的指标,而安全研究则需要更深刻的理解和哲学能力。然而,随着AI能力的提升,这种差距可能会缩小,因为AI可能能够设计出能够很好地衡量安全性的简单指标。

🤝 **协调与合作:** 随着AI能力的提升,更高效的跨实验室或国际层面的协调可能会提高支付高安全税的能力。但这取决于我们是否能够在安全税需求高峰到来之前获得足够的能力。

🧭 **安全税与技术发展方向:** 安全税与技术发展方向密切相关。如果我们能够更好地理解AI的潜在危险,并制定出更有效的安全措施,就能更好地应对AI带来的风险。

💡 **AI安全研究的重要性:** 作者认为,AI安全是一个非常重要的议题,需要更多的讨论。相比讨论AI风险的总量,探讨AI发展过程中安全税的峰值更具行动指导意义,也更被忽视。

🌟 **AI安全研究的未来:** 作者认为,随着AI能力的提升,AI安全研究将变得更加重要。我们需要不断探索新的技术和方法来确保AI的安全和对齐,并为AI的未来发展制定更加完善的伦理规范。

Published on October 23, 2024 12:18 PM GMT

Two important themes in many discussions of the future of AI are:

    AI will automate research, and thus accelerate technological progressThere are serious risks from misaligned AI systems (that justify serious investments in safety)

How do these two themes interact? Especially: how should we expect the safety tax requirements to play out as progress accelerates and we see an intelligence explosion?

In this post I’ll give my core views on this:

I developed these ideas in tandem with my exploration of the concepts of safety tax landscapes, that I wrote about in a recent post. However, for people who are just interested in the implications for AI, I think that this post will largely stand alone.

How AI differs from other dangerous technologies

In the post on safety tax functions, my analysis was about a potentially-dangerous technology in the abstract (nothing specific about AI). We saw that:

For most technologies, these abilities — the ability to invest in different aspects of the tech, and the ability to coordinate — are relatively independent of the technology; better solar power doesn’t do much to help us do more research, or sign better treaties. Not so for AI! To a striking degree, AI safety is a dynamic problem — earlier capabilities might change the basic nature of the problem we are later facing. 

In particular:

These are, I believe, central cases of the potential value of differential technological development (or d/acc) in AI. I think this is an important topic, and it’s one I expect to return to in future articles.

Where is the safety tax peak for AI?

Why bother with the conceptual machinery of safety tax functions? A lot of the reason I spent time thinking about it was trying to get a handle on this question — which parts of the AI development curve should we be most concerned about?

I think this is a crucial question for thinking about AI safety, and I wish it had more discussion. Compared to talking about the total magnitude of the risks, I think this question is more action-guiding, and also more neglected.

In terms of my own takes, it seems to me that:

On net, my picture looks very approximately like this:

(I think this graph will probably make rough intuitive sense by itself, but if you want more details about what the axes and contours are supposed to mean, see the post on safety tax functions.) 

I’m not super confident in these takes, but it seems better to be wrong than vague — if it’s good to have more conversations about this, I’d rather offer something to kick things off than not. If you think this picture is wrong — and especially if you think the peak risk lies somewhere else — I’d love to hear about that.

And if this picture is right — then what? I suppose I would like to see more work which is targeting this period.[2] This shouldn’t mean stopping safety work for early AGI — that’s the first period with appreciable risk, and it can’t be addressed later. But it should mean increasing political work which lays the groundwork for coordinating to pay high safety taxes in the later period. And it should mean working to differentially accelerate those beneficial applications of AI that may help us to navigate the period well.

Acknowledgements: Thanks to Tom Davidson, Rose Hadshar, and Raymond Douglas for helpful comments.

  1. ^

    Of course “around as smart as humans” is a vague term; I’ll make it slightly less vague by specifying “at research and strategic planning”, which I think are the two most strategically important applications of AI.

  2. ^

    This era may roughly coincide with the last era of human mistakes — since AI abilities are likely to be somewhat spiky compared to humans, we’ll probably have superintelligence in many important ways before human competence is completely obsoleted. So the interventions for helping I discussed in that post may be relevant here. However, I painted a somewhat particular picture in that post, which I expect to be wrong in some specifics; whereas here I’m trying to offer a more general analysis.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI安全 安全税 超智能 自动化 伦理
相关文章