No, We're Not Getting Meaningful Oversight of AI

少点错误 07月09日 19:17

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了在AI安全讨论中，人们普遍认为的“监督”能够有效解决AI风险的观点。作者认为，这种观点常常是错误的。文章指出，有效的监督既难以实现，也往往缺失。作者呼吁AI开发者停止将“监督”作为万能的解决方案，并强调了明确监督类型、风险、失效模式以及有效性的重要性。文章认为，仅凭故事无法阻止灾难的发生，需要更严谨的AI安全措施。

🧐 普遍的假设：在AI安全讨论中，尤其是在技术对齐领域之外，人们普遍认为监督可以解决问题，例如通过人为干预、红队审计或治理委员会审查部署。

🤔 监督的缺失：作者认为，这种对监督的信念常常是错误的，有效的监督既难以实现，也往往缺失。这意味着，即使存在监督的可能，它也可能并未实际存在。

📢 明确监督的重要性：作者认为，AI开发者需要停止将“监督”作为万能的解决方案。如果声称AI系统受到监督，需要明确监督的类型（控制与监督）、解决的风险、失效模式以及有效性。

⚠️ 故事无法阻止灾难：作者强调，仅仅依靠故事和承诺是不足以阻止AI安全问题的。有效的AI安全措施需要更严谨、更实际的方案。

Published on July 9, 2025 11:10 AM GMT

One of the most common (and comfortable) assumptions in AI safety discussions—especially outside of technical alignment circles—is that oversight will save us. Whether it's a human in the loop, a red team audit, or a governance committee reviewing deployments, oversight is invoked as the method by which we’ll prevent unacceptable outcomes.

It shows up everywhere: in policy frameworks, in corporate safety reports, and in standards documents. Sometimes it’s explicit, like the EU AI Act saying that High-risk AI systems must be subject to human oversight, or stated as an assumption, as in a Deepmind paper also released yesterday, where they say that scheming won't happen because AI won't be able to evade oversight. Other times it’s implicit, firms claiming that they are mitigating risk through regular audits and fallback procedures, or arguments that no-one will deploy unsafe systems in places without sufficient oversight.

But either way, there’s a shared background belief: that meaningful oversight is both possible, and likely or happening already.

In a new paper with Aidan Homewood, "Limits of Safe AI Deployment: Differentiating Oversight and Control," we argue, among other things, that both parts of this belief are often false. Where meaningful oversight is possible, it isn't present, and in many cases, it's not even possible.

So AI developers need to stop waving “oversight” around as a magic word. If you’re going to claim your AI system is supervised, you should be able to clearly say:

What kind of supervision it is (control vs. oversight),What risks it addresses,What its failure modes are, andWhy it will actually work.

If you can’t do that, you don’t have oversight. You have a story. And stories don't stop disasters.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签