少点错误 前天 01:57
Comparing risk from internally-deployed AI to insider and outsider threats from humans
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能(AI)时代下,AI控制与传统计算机安全之间的关系,特别关注了“外部安全”和“内部安全”的差异。作者认为,AI公司在部署AI时,往往忽视了AI可能构成的内部威胁,倾向于将其视为外部威胁来防范。与传统工程师相比,AI可能需要更广泛、更细粒度的访问权限,从而扩大了攻击面,使得静态安全措施难以奏效。文章强调了AI内部威胁的复杂性,它结合了传统内部威胁和外部威胁的特点,对AI公司提出了更高的安全挑战。

🛡️作者区分了两种主要的计算机安全模式:“外部安全”和“内部安全”。“外部安全”主要通过静态、自动化系统限制用户访问,例如Facebook和AWS对用户的安全防护。这种安全模式旨在防止用户违反基本安全属性,依赖于无需人工干预的代码执行。

🧑‍💻“内部安全”则关注如何防范内部人员滥用权限,例如工程师。公司采用多方授权、代码审查等依赖人工的措施。与外部威胁相比,内部威胁涉及更细粒度的访问需求,但内部人员数量相对较少,且迭代攻击能力较弱。

🤖作者认为,AI公司在部署AI时,常常未能充分预见AI可能构成的内部威胁。AI工程师可能需要类似内部人员的广泛访问权限,从而扩大攻击面,使静态安全方法难以奏效。AI内部威胁结合了内部威胁和外部威胁的复杂性,增加了安全防护的难度。

Published on June 23, 2025 5:47 PM GMT

I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here’s one point that I think is important.

My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from insiders”.

On the “security from outsiders” end of the spectrum, you have some security invariants you try to maintain entirely by restricting affordances with static, entirely automated systems. My sense is that this is most of how Facebook or AWS relates to its users: they want to ensure that, no matter what actions the users take on their user interfaces, they can't violate fundamental security properties. For example, no matter what text I enter into the "new post" field on Facebook, I shouldn't be able to access the private messages of an arbitrary user. And on AWS, no matter what code I run on my EC2 instance, I shouldn't be able to access someone else's private S3 bucket.

This type of security is very "static": invariants are enforced by code that doesn't involve human review and so don't rely on humans reacting properly to something. This is typically how you mitigate risks from your users--users typically only need to have certain restricted types of access to your systems, and they're very numerous, so you need to handle them with automated approaches. I sometimes call systems that aim to be safe no matter what inputs a user provides "manifestly safe".

In contrast, consider how Facebook and AWS relate to their engineers. These companies want to be robust to their engineers abusing their access to violate security properties; they could do this either by abusing administrator privileges (e.g. direct access to machines that have user data) or by adding malign behavior to code that they write. In order to mitigate these risks, companies use very different types of security techniques. They rely on multi-party authorization, where multiple engineers have to sign off on executing sensitive commands, or code review, where someone needs to review code before it's pushed to environments where it has access to sensitive data. These techniques fundamentally rely on human labor.

Insider threats are harder to handle in that your engineers need to have much more fine-grained access, but easier in that your engineers are generally less numerous than your users. Moreover, insiders don't have as much ability to iterate against your security measures (because if you catch them trying to break your security measures, you can often fire them or have them arrested).

Of course, there's a spectrum here. A big part of how you mitigate threats from external hackers is having security staff investigate weird anomalies, in the hope of catching hackers who have broken some of your intended security boundaries. And a big part of how you mitigate insider threats is building safe administrative APIs, so that instead of your employees having to have the ability to run arbitrary code on production machines, you allow them to usually do their job by taking pre-specified, known-safe actions. And there’s much more along these lines.

I feel like there’s a really important distinction between “security from insiders” and “security from outsiders”, and I don’t really get why security people don’t seem to find this distinction as important.

Now that I have this concept, I can make a few points about the risks posed by AIs deployed inside AI companies. (Risks from internal deployments have been discussed by me here (as well as many other places), by a Redwood/UK AISI collaboration here, and by Apollo here.)

First, when I talk to security staff at AI companies about computer security, they often seem to fail to anticipate what insider threat from AIs will be like. In particular, when discussing how they'll deploy AI engineers and researchers internally, they often seem to imagine relating to those AIs mostly as outsiders rather than as insiders. They imagine writing small and carefully locked-down infrastructure and allowing the AIs to interact with it. But my guess is that in order to do their jobs, AIs will need to have the kind of wide-ranging access that insiders have, which leads to having an attack surface that is large and can’t be protected with static methods. (I’m not sure I’m right about this, and it seems very important to get clarity on; I plan to write more about it soon.) I think the security teams at AI companies are not putting enough work into preparing for this.

And second, insider threat from AI combines the hard parts of both problems! Like insider threat, AIs will need extremely broad, fine-grained access. And like outsider threat, you're outnumbered by your potential adversaries. (Of course, you have a bunch of advantages too, and that's why we research AI control rather than just writing off the problem as hopeless.)



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 内部威胁 外部威胁 计算机安全
相关文章