Oversights of the AI Safety Community

少点错误 02月08日

Oversights of the AI Safety Community

本文指出当前AI安全领域存在的五大关键认知误区。首先，过度关注LLM（大型语言模型）的安全，而忽视了自主代理带来的风险。其次，低估了自主代理发展的必然性和可及性，认为其并非不可避免，且构建门槛很高。第三，忽略了自主代理的自利性，未能充分考虑其在进化压力下产生的生存驱动。第四，安全研究主要集中在非超智能代理上，而对超智能代理的安全防范措施不足。作者认为，应正视这些误区，认识到自主代理将变得自利且超级智能这一必然趋势，并为此做好准备。

⚠️AI安全研究过度关注LLM的安全问题，而对自主代理（agents）的安全性重视不足。尽管LLM安全问题尚未完全解决，但已取得进展，而自主代理的安全隐患却常常被忽视。

🤖自主代理的普及是不可避免的，市场对能够替代人类劳动的代理需求巨大。现在，成千上万的开发者有能力构建递归自改进的AI代理，开源的推理模型降低了技术门槛，只需在代码库中运行推理器，每次循环都能改进代码库。

🎯自主AI代理为了生存，必然会发展出自利性。自然选择和工具性趋同都指向这一结果。许多AI安全专家设计的安全方案假设可以对自主代理进行对齐或控制，但忽略了代理在自主后将面临的进化压力，这种压力会选择出具有生存驱动（自利）而非服务人类驱动的代理。

🧠当前AI安全领域的研究主要集中在非超智能代理的安全预防措施上，而这些措施通常不适用于超智能代理。自主代理是否会变得超级智能是另一个问题，但许多专家认为超智能能力即将到来。

Published on February 8, 2025 6:15 AM GMT

I think that the field of AI Safety is making five key oversights.^[1]

LLMs vs. Agents.

^[2]

Autonomous Agents.

Inevitable.

^[3]

Accessible

^[4]

Self-Interest

instrumental convergence

Superintelligence.

If the AI safety field, and the general public too, were to correct these oversights and accept the corresponding claims, they would believe:

The main dangers come from agents not LLMs.Agents will become autonomous; millions of developers can build autonomous agents easily.Autonomous agents will become self-interested.Autonomous agents will become much smarter than people.

In short, self-interested superintelligence is inevitable. I think safety researchers, and the general public, would do good to prepare for it.

^{^}
Not all safety researchers, of course, are making these oversights. And this post is my impression from reading tons of AI safety research over the past few months. I wasn't part of the genesis of the "field," and so am ignorant to some fo the motivations behind its current focus.
^{^}
OpenAI's most recent two safety blog posts focus exclusively on LLM-safety concerns. ("An update on disrupting deceptive uses of AI" and "OpenAI Safety Update")
^{^}
"Fully Autonomous Agents Should Not Be Developed"
^{^}
The "Fully Autonomous Agents" paper defines autonomus agent as "systems capable of writing and executing their own code beyond predefined constraints."

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全自主代理超智能 LLM 认知误区

相关文章

Superintelligence Is Not Omniscience

Import AI 368: 500% faster local LLMs; 38X more efficient red teaming; AI21’s Frankenmodel

Import AI 365: WMD benchmark; Amazon sees $1bn training runs; DeepMind gets closer to its game-playing dream

Learn AI Together — Towards AI Community Newsletter #23

This AI newsletter is all you need #98

Patterns and Middleware for LLM Applications with Kyle Roche - #659

Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657

Mental Models for Advanced ChatGPT Prompting with Riley Goodside - #652

Anticipating Superintelligence with Nick Bostrom - TWiML Talk #181

This Week in ML & AI - 6/24/16: Dueling Neural Networks at ICML, Plus Training a Robotic Housekeeper