少点错误 02月08日
Oversights of the AI Safety Community
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文指出当前AI安全领域存在的五大关键认知误区。首先,过度关注LLM(大型语言模型)的安全,而忽视了自主代理带来的风险。其次,低估了自主代理发展的必然性和可及性,认为其并非不可避免,且构建门槛很高。第三,忽略了自主代理的自利性,未能充分考虑其在进化压力下产生的生存驱动。第四,安全研究主要集中在非超智能代理上,而对超智能代理的安全防范措施不足。作者认为,应正视这些误区,认识到自主代理将变得自利且超级智能这一必然趋势,并为此做好准备。

⚠️AI安全研究过度关注LLM的安全问题,而对自主代理(agents)的安全性重视不足。尽管LLM安全问题尚未完全解决,但已取得进展,而自主代理的安全隐患却常常被忽视。

🤖自主代理的普及是不可避免的,市场对能够替代人类劳动的代理需求巨大。现在,成千上万的开发者有能力构建递归自改进的AI代理,开源的推理模型降低了技术门槛,只需在代码库中运行推理器,每次循环都能改进代码库。

🎯自主AI代理为了生存,必然会发展出自利性。自然选择和工具性趋同都指向这一结果。许多AI安全专家设计的安全方案假设可以对自主代理进行对齐或控制,但忽略了代理在自主后将面临的进化压力,这种压力会选择出具有生存驱动(自利)而非服务人类驱动的代理。

🧠当前AI安全领域的研究主要集中在非超智能代理的安全预防措施上,而这些措施通常不适用于超智能代理。自主代理是否会变得超级智能是另一个问题,但许多专家认为超智能能力即将到来。

Published on February 8, 2025 6:15 AM GMT

I think that the field of AI Safety is making five key oversights.[1]

    LLMs vs. Agents. AI Safety research, in my opinion, has been quite thorough with regard to LLMs. LLM-safety hasn't been solved, but it has progressed . On the other hand, safety concerns posed by agents are occasionally addressed but mostly neglected.[2] Maybe researchers/AGI labs emphasize LLM safety research because it's the more tractable field, even though the vast majority of the risk comes from agents with autonomy (even ones powered by neutered LLMs).Autonomous Agents. There are two key oversights about autonomous agents.
      Inevitable. Market demand for agents which can replace human labor is inordinate. Digital employees which replace human employees must be autonomous. I've seen several AI well-intentioned safety researchers who assume autonomous agents are not inevitable.[3]Accessible. There are now hundreds of thousands of developers who have the ability to build recursively self-improving (ie autonomous) AI agents. Powerful reasoning models are open-source. All it takes is to run a reasoner in a codebase, where each loop improves the codebase. That's the core of an autonomous agent.[4] The only way a policy recommendation that "fully autonomous agents should not be developed" is meaningful is if the keys to autonomous agents are in the hands of a few convincable individuals. AGI labs (eg OpenAI) influence the ability for external developers to create powerful LLM-powered agents agents (by choosing whether or not to release new LLMs), but they are in competition to release new models, and they do not control the whole agent stack.
    Self-Interest. The AI agents which are aiming to survive will be the ones that do. Natural selection and instrumental convergence both ultimately predict this. Many AI safety experts design safety proposals that assume it possible to align or even control autonomous agents. They neglect evolutionary pressures agents will face when autonomous, which select for a survival drive (self-interest) above a serve-humans drive. The ones with aims other than survival will die first.Superintelligence. Most of the field is focused on safety precautions concerning agents which are not super-intelligent (much smarter than people). These recommendations generally do not apply to agents which are super-intelligent. There is separate question of whether autonomous agents will become super-intelligent. See this essay for reasons why smart people believe SI capabilities are near.

If the AI safety field, and the general public too, were to correct these oversights and accept the corresponding claims, they would believe:

    The main dangers come from agents not LLMs.Agents will become autonomous; millions of developers can build autonomous agents easily.Autonomous agents will become self-interested.Autonomous agents will become much smarter than people.

In short, self-interested superintelligence is inevitable. I think safety researchers, and the general public, would do good to prepare for it.

  1. ^

    Not all safety researchers, of course, are making these oversights. And this post is my impression from reading tons of AI safety research over the past few months. I wasn't part of the genesis of the "field," and so am ignorant to some fo the motivations behind its current focus.

  2. ^
  3. ^
  4. ^


Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 自主代理 超智能 LLM 认知误区
相关文章