少点错误 04月05日
AI companies’ unmonitored internal AI use poses serious risks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了AI公司内部使用未公开AI模型时存在的安全风险。作者指出,这些公司在关键任务中使用自研AI,如编写安全代码和分析安全测试结果,但缺乏对这些内部使用的全面监控。文章强调了监控的重要性,类似于生物实验室对病原体的控制,以防止AI利用其权限逃逸。作者建议AI公司进行内部监控,记录和分析AI的使用情况,并提出建立实时控制系统的设想,以应对潜在风险。文章还讨论了Anthropic、OpenAI和Google DeepMind等公司的相关立场,以及公司可能不愿实施全面监控的原因,并提出了一些可行的替代方案。

🛡️ AI公司在其内部使用未公开的AI模型,例如用于编写安全关键代码和分析安全测试结果,但目前对这些内部使用缺乏全面监控。

⚠️ AI行业与传统行业不同,AI的潜在风险更类似于生物实验室测试新病原体。如果AI公司不密切关注,AI可能通过利用其在安全代码中的作用来“逃逸”。

🔍 建议AI公司实施内部监控,包括记录所有前沿模型的使用情况,分析日志以检测AI的异常行为,并对可疑模式采取行动。作者还提出了建立实时控制系统的设想,以进一步加强安全措施。

Published on April 4, 2025 6:17 PM GMT

AI companies’ unmonitored internal AI use poses serious risks
AI companies use their own models to secure the very systems the AI runs on. They should be monitoring those interactions for signs of deception and misbehavior.

AI companies use powerful unreleased models for important work, like writing security-critical code and analyzing results of safety tests. From my review of public materials, I worry that AI is largely unmonitored when AI companies use it internally today, despite its role in important work.

Right now, unless an AI company specifically says otherwise, you should assume they don’t have visibility into all the ways their models are being used internally and the risks these could pose.

In many industries, using an unreleased product internally can’t harm third-party outsiders. A car manufacturer might test-drive its new prototype around the company campus, but civilian pedestrians can’t be hurt—no matter how many manufacturing defects—because the car is contained to the company’s facilities.

The AI industry is different—more like biology labs testing new pathogens: The biolab must carefully monitor and control conditions like air pressure, or the pathogen might leak into the external world. Same goes for AI: If we don’t keep a watchful eye, the AI might be able to escape—for instance, by exploiting its role in shaping the security code meant to keep the AI locked inside the company’s computers.

The minimum improvement we should ask for is retroactive internal monitoring: AI companies logging all internal use of frontier models by their company, analyzing the logs (e.g., for signs of the AI engaging in scheming), and following up on concerning patterns.

An even stronger vision—likely necessary in the near future, as AI systems become more capable—is to wrap all internal uses of frontier AI within real-time control systems that can intervene on the most urgent interactions.

In this post, I’ll cover:

(If you aren’t sure what I mean by “monitoring”—logging, analysis, and enforcement—you might want to check out my explainer first.)

[ continues on Substack ]



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 内部监控 AI公司 风险管理
相关文章