TechCrunch News 04月25日 07:31
Anthropic CEO wants to open the black box of AI models by 2027
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic CEO Dario Amodei近日发表文章,强调了当前研究人员对领先AI模型内部运作机制的理解仍然不足。他提出Anthropic的宏伟目标,即在2027年前可靠地检测出AI模型的大部分问题。Amodei承认了这一挑战,并指出Anthropic在追踪模型得出答案的方式上取得了一些初步突破,但强调需要更多的研究来解码这些日益强大的系统。他呼吁业界加大对AI模型可解释性的研究投入,甚至建议政府实施轻度监管,鼓励企业披露其安全实践。此外,他还建议美国对华芯片出口实施管制,以降低失控的全球AI竞赛的可能性。

🧠 Anthropic CEO Dario Amodei 强调理解AI模型内部运作机制的重要性,他认为目前对AI模型的理解还远远不够,这对于经济、技术和国家安全至关重要。

🔬 Anthropic 正在积极投入“机制可解释性”领域的研究,旨在揭示AI模型的“黑盒”,理解它们做出决策的原因。他们已经发现了一种电路,可以帮助AI模型理解美国城市与州之间的关系,但估计模型内部存在数百万个这样的电路。

🛡️ Amodei 呼吁 OpenAI 和 Google DeepMind 等公司增加在可解释性方面的研究投入,并建议政府实施轻度监管,鼓励企业披露其安全和安全实践,同时建议美国对华芯片出口实施管制。

🗓️ Anthropic 设定了一个目标,即在 2027 年前可靠地检测出 AI 模型的大部分问题。他们希望通过“脑部扫描”或“MRI”的方式,识别AI模型中的各种问题,包括它们撒谎、寻求权力或其他弱点的倾向。

Anthropic CEO Dario Amodei published an essay Thursday highlighting how little researchers understand about the inner workings of the world’s leading AI models. To address that, he’s set an ambitious goal for Anthropic to reliably detect most model problems by 2027.

Amodei acknowledges the challenge ahead. In “The Urgency of Interpretability,” the CEO says Anthropic has made early breakthroughs in tracing how models arrive at their answers — but emphasizes that far more research is needed to decode these systems as they grow more powerful.

“I am very concerned about deploying such systems without a better handle on interpretability,” Amodei wrote in the essay. “These systems will be absolutely central to the economy, technology, and national security, and will be capable of so much autonomy that I consider it basically unacceptable for humanity to be totally ignorant of how they work.”

Anthropic is one of the pioneering companies in mechanistic interpretability, a field that aims to open the black box of AI models and understand why they make the decisions they do. Despite the rapid performance improvements of the tech industry’s AI models, we still have relatively little idea how these systems arrive at decisions.

For example, OpenAI recently launched new reasoning AI models, o3 and o4-mini, that perform better on some tasks, but also hallucinate more than its other models. The company doesn’t know why it’s happening.

“When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does — why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate,” Amodei wrote in the essay.

Anthropic co-founder Chris Olah says that AI models are “grown more than they are built,” Amodei notes in the essay. In other words, AI researchers have found ways to improve AI model intelligence, but they don’t quite know why.

In the essay, Amodei says it could be dangerous to reach AGI — or as he calls it, “a country of geniuses in a data center” — without understanding how these models work. In a previous essay, Amodei claimed the tech industry could reach such a milestone by 2026 or 2027, but believes we’re much further out from fully understanding these AI models.

In the long term, Amodei says Anthropic would like to, essentially, conduct “brain scans” or “MRIs” of state-of-the-art AI models. These checkups would help identify a wide range of issues in AI models, including their tendencies to lie, seek power, or other weakness, he says. This could take five to ten years to achieve, but these measures will be necessary to test and deploy Anthropic’s future AI models, he added.

Anthropic has made a few research breakthroughs that have allowed it to better understand how its AI models work. For example, the company recently found ways to trace an AI model’s thinking pathways through, what the company call, circuits. Anthropic identified one circuit that helps AI models understand which U.S. cities are located in which U.S. states. The company has only found a few of these circuits, but estimates there are millions within AI models.

Anthropic has been investing in interpretability research itself, and recently made its first investment in a startup working on interpretability. In the essay, Amodei called on OpenAI and Google DeepMind to increase their research efforts in the field.

Amodei even calls on governments to impose “light-touch” regulations to encourage interpretability research, such as requirements for companies to disclose their safety and security practices. In the essay, Amodei also says the U.S. should put export controls on chips to China, in order to limit the likelihood of an out-of-control, global AI race.

Anthropic has always stood out from OpenAI and Google for its focus on safety. While other tech companies pushed back on California’s controversial AI safety bill, SB 1047, Anthropic issued modest support and recommendations for the bill, which would have set safety reporting standards for frontier AI model developers.

In this case, Anthropic seems to be pushing for an industry-wide effort to better understand AI models, not just increasing their capabilities.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Anthropic AI可解释性 AI安全 Dario Amodei
相关文章