少点错误 2024年12月20日
Reminder: AI Safety is Also a Behavioral Economics Problem
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI发布o1模型时,其安全测试并非针对最终发布版本,引发关注。有工程师表示,这是因为模型迭代速度太快,导致安全测试成为负担。文章指出,目前AI安全测试是自愿的,而非法律强制。在激烈的竞争环境下,研究人员可能为了赶进度而忽视或简化安全测试。这提醒我们,AI安全不仅是技术问题,更是行为经济学问题。要提高测试质量,需降低研究人员的测试负担,使他们愿意主动进行安全评估。

⚠️ OpenAI的o1模型安全测试,并非针对最终发布版本进行,引发了对测试有效性的质疑。

🚀 工程师的解释是,模型迭代速度过快,导致安全测试成为一种负担,为了赶进度,测试可能被简化。

⚖️ AI安全测试目前是自愿行为,而非法律强制,这导致在竞争压力下,公司可能选择降低测试标准。

🎯 降低研究人员的测试负担是提高AI安全测试质量的关键,使他们愿意主动进行安全评估,而非将其视为阻碍。

💡 AI安全不仅是技术问题,更是一个行为经济学问题,需要从激励机制入手,鼓励更严格的安全测试。

Published on December 20, 2024 1:40 AM GMT

Last week, OpenAI released the official version of o1, alongside a system card explaining their safety testing framework. Astute observers, most notably Zvi, noted something peculiar: o1's safety testing was performed on a model that... wasn't the release version of o1 (or o1 pro).

Weird! Unexpected! If you care about AI safety, bad! If you fall in this last camp your reaction was probably something like Zvi's:

That’s all really, really, really not okay.

While Zvi's post thoroughly examines the tests, their unclear results, etc., I wanted to zoom in a little more on this tweet from roon (OAI engineer):

"unironically the reason [this happened] is that progress is so fast that we have to write more of these model cards these days. the preparedness evals are more  to certify that models aren't dangerous rather than strict capability evals for showing off"

My loose translation is something like: "these tests are annoying to run so for our use-case rough approximation is good enough." 

You may not like it, but the following is simply fact: AI safety tests are voluntary, not legally mandated. The core issue wasn't that OpenAI didn't recognize they were cutting corners – the tests were just a pain in the ass.

Put yourself in a researcher's shoes: you're moving fast, competing in a race to the bottom with other companies. You develop an AI model that, while powerful, clearly can't cause Armageddon (yet). Meanwhile, what you view as a slightly histrionic cadre of alarmed critics are demanding you stop work and/or spend significant personal time conducting safety testing that you don't think in this instance rises to the level of "necessary for the future of humanity." Do you: 

A) Rush through it to move on with your life 

B) Meticulously complete a task you believe is excessive, crossing every t and dotting every i

The quiz is left as an exercise for the reader.

This saga is a friendly reminder: today, AI safety testing is a choice companies make, not mandatory. If researchers or developers dislike running your tests, they'll cut corners. AI safety isn't just a technical challenge – it's a behavioral economics problem.

Until something fundamentally changes (and let's be clear, that's unlikely in the near term), researcher pain (RP) is a key KPI of AI safety test quality. Shaming people is certainly a strategy, but frankly I don't think it works well.

To be solution-oriented, this presents a clear target for safety research: make tests that are so low RP even AI safety skeptics see no cost in running them. Actually being run is an important quality of a safety eval!



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 OpenAI 安全测试 行为经济学 模型迭代
相关文章