Fortune | FORTUNE 07月22日 20:15
Researchers from top AI labs including Google, OpenAI, and Anthropic warn they may be losing the ability to understand advanced AI models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近期,来自OpenAI、Google DeepMind、Anthropic和Meta等顶尖实验室的40名AI研究者在一份联合立场声明中发出警告,指出随着AI模型能力的飞速发展,研究人员可能很快失去理解高级AI推理模型的能力。他们特别强调了“思维链”(Chain-of-Thought, CoT)过程的重要性,这一过程允许监测AI的决策逻辑,为AI安全提供了一种透明度。研究者们担心,这种可见性可能随着模型迭代而减弱,因此呼吁AI开发者和研究界进一步探索并努力保持CoT的可视性和可追踪性,将其作为一种潜在的内置安全机制,以应对AI决策过程日益增长的不透明性带来的安全和控制风险。

💡 **AI推理模型的可视性面临挑战**:研究人员发现,尽管像GPT-4o这样的模型通过“思维链”(CoT)展示了其推理过程,但随着模型复杂性的增加,这种可见性可能难以维持,甚至研究者们可能很快失去理解这些高级模型的能力,这引发了对AI安全和控制的担忧。

🔗 **“思维链”作为AI安全机制的潜力**:CoT过程允许研究者观察AI的思考路径,从而监测其意图,识别潜在的“不良行为”。这种透明度被视为一种独特的AI安全机会,尽管它并不完美,但仍具有重要的研究价值和应用前景。

🔬 **呼吁加强对CoT的研究与保护**:鉴于CoT对AI安全的重要性,研究者们强烈建议AI开发者和研究社区应密切关注CoT的运作机制,并投入更多资源进行可监测性研究,同时努力保留和加强CoT的可追踪性,将其作为AI安全框架的重要组成部分。

📈 **AI模型内部运作的“黑箱”问题**:尽管AI在性能上取得了显著进步,但其内部工作原理,尤其是推理过程,对研究者来说仍然是一个“黑箱”。这种日益增长的不透明性不仅增加了安全风险,也使得对AI行为的理解和控制变得更加困难。

AI researchers from leading labs are warning that they could soon lose the ability to understand advanced AI reasoning models.

In a position paper published last week, 40 researchers, including those from OpenAI, Google DeepMind, Anthropic, and Meta, called for more investigation into AI reasoning models’ “chain-of-thought” process. Dan Hendrycks, an xAI safety advisor, is also listed among the authors.

The “chain-of-thought” process, which is visible in reasoning models such as OpenAI’s GPT-4o and DeepSeek’s R1, allows users and researchers to monitor an AI model’s “thinking” or “reasoning” process, illustrating how it decides on an action or answer and providing a certain transparency into the inner workings of advanced models.

The researchers said that allowing these AI systems to “‘think’ in human language offers a unique opportunity for AI safety,” as they can be monitored for the “intent to misbehave.” However, they warn that there is “no guarantee that the current degree of visibility will persist” as models continue to advance.

The paper highlights that experts don’t fully understand why these models use CoT or how long they’ll keep doing so. The authors urged AI developers to keep a closer watch on chain-of-thought reasoning, suggesting its traceability could eventually serve as a built-in safety mechanism.

“Like all other known AI oversight methods, CoT [chain-of-thought] monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise, and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods,” the researchers wrote.

“CoT monitoring presents a valuable addition to safety measures for frontier AI, offering a rare glimpse into how AI agents make decisions. Yet, there is no guarantee that the current degree of visibility will persist. We encourage the research community and frontier AI developers to make the best use of CoT monitorability and study how it can be preserved,” they added.

The paper has been endorsed by major figures, including OpenAI co-founder Ilya Sutskever and AI godfather Geoffrey Hinton.

Reasoning Models

AI reasoning models are a type of AI model designed to simulate or replicate human-like reasoning—such as the ability to draw conclusions, make decisions, or solve problems based on information, logic, or learned patterns. Advancing AI reasoning has been viewed as a key to AI progress among major tech companies, with most now investing in building and scaling these models.

OpenAI publicly released a preview of the first AI reasoning model, o1, in September 2024, with competitors like xAI and Google following close behind.

However, there are still a lot of questions about how these advanced models are actually working. Some research has suggested that reasoning models may even be misleading users through their chain-of-thought processes.

Despite making big leaps in performance over the past year, AI labs still know surprisingly little about how reasoning actually unfolds inside their models. While outputs have improved, the inner workings of advanced models risk becoming increasingly opaque, raising safety and control concerns.

Introducing the 2025 Fortune 500

, the definitive ranking of the biggest companies in America. 

Explore this year's list.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 思维链 AI推理 模型透明度 AI研究
相关文章