少点错误 前天 01:47
How threats from internally-deployed AI compares to insider and outsider threats from humans
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了人工智能(AI)控制与传统计算机安全之间的关系,重点关注了AI在AI公司内部部署时带来的安全挑战。作者区分了“来自外部的安全性”和“来自内部的安全性”,指出AI工程师可能需要像内部人员一样广泛的访问权限,从而扩大了攻击面。文章强调,AI公司在应对AI内部威胁时,需要结合内部和外部安全策略的复杂性,并呼吁安全团队重视这一挑战,做好充分准备。

🛡️ 作者区分了两种主要的安全性类型:“来自外部的安全性”和“来自内部的安全性”。前者主要通过静态、自动化的系统限制用户访问,如Facebook和AWS对普通用户的安全措施;后者则依赖于人工审核和多方授权,以应对内部人员的滥用行为,如Facebook和AWS对工程师的安全控制。

🧑‍💻 作者认为,AI在AI公司内部部署时,其安全风险更类似于内部威胁。AI工程师可能需要广泛的访问权限才能完成工作,这与内部人员的情况相似,从而增加了攻击面,难以通过静态方法保护。

🤯 AI内部威胁结合了内部和外部威胁的复杂性。AI需要像内部人员一样的精细访问权限,同时,潜在的“攻击者”数量众多,类似于外部威胁。文章强调了AI公司安全团队需要为此做好准备,并重视这一挑战。

Published on June 23, 2025 5:47 PM GMT

I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here’s one point that I think is important.

My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from insiders”.

On the “security from outsiders” end of the spectrum, you have some security invariants you try to maintain entirely by restricting affordances with static, entirely automated systems. My sense is that this is most of how Facebook or AWS relates to its users: they want to ensure that, no matter what actions the users take on their user interfaces, they can't violate fundamental security properties. For example, no matter what text I enter into the "new post" field on Facebook, I shouldn't be able to access the private messages of an arbitrary user. And on AWS, no matter what code I run on my EC2 instance, I shouldn't be able to access someone else's private S3 bucket.

This type of security is very "static": invariants are enforced by code that doesn't involve human review and so don't rely on humans reacting properly to something. This is typically how you mitigate risks from your users--users typically only need to have certain restricted types of access to your systems, and they're very numerous, so you need to handle them with automated approaches. I sometimes call systems that aim to be safe no matter what inputs a user provides "manifestly safe".

In contrast, consider how Facebook and AWS relate to their engineers. These companies want to be robust to their engineers abusing their access to violate security properties; they could do this either by abusing administrator privileges (e.g. direct access to machines that have user data) or by adding malign behavior to code that they write. In order to mitigate these risks, companies use very different types of security techniques. They rely on multi-party authorization, where multiple engineers have to sign off on executing sensitive commands, or code review, where someone needs to review code before it's pushed to environments where it has access to sensitive data. These techniques fundamentally rely on human labor.

Insider threats are harder to handle in that your engineers need to have much more fine-grained access, but easier in that your engineers are generally less numerous than your users. Moreover, insiders don't have as much ability to iterate against your security measures (because if you catch them trying to break your security measures, you can often fire them or have them arrested).

Of course, there's a spectrum here. A big part of how you mitigate threats from external hackers is having security staff investigate weird anomalies, in the hope of catching hackers who have broken some of your intended security boundaries. And a big part of how you mitigate insider threats is building safe administrative APIs, so that instead of your employees having to have the ability to run arbitrary code on production machines, you allow them to usually do their job by taking pre-specified, known-safe actions. And there’s much more along these lines.

I feel like there’s a really important distinction between “security from insiders” and “security from outsiders”, and I don’t really get why security people don’t seem to find this distinction as important.

Now that I have this concept, I can make a few points about the risks posed by AIs deployed inside AI companies. (Risks from internal deployments have been discussed by me here (as well as many other places), by a Redwood/UK AISI collaboration here, and by Apollo here.)

First, when I talk to security staff at AI companies about computer security, they often seem to fail to anticipate what insider threat from AIs will be like. In particular, when discussing how they'll deploy AI engineers and researchers internally, they often seem to imagine relating to those AIs mostly as outsiders rather than as insiders. They imagine writing small and carefully locked-down infrastructure and allowing the AIs to interact with it. But my guess is that in order to do their jobs, AIs will need to have the kind of wide-ranging access that insiders have, which leads to having an attack surface that is large and can’t be protected with static methods. (I’m not sure I’m right about this, and it seems very important to get clarity on; I plan to write more about it soon.) I think the security teams at AI companies are not putting enough work into preparing for this.

And second, insider threat from AI combines the hard parts of both problems! Like insider threat, AIs will need extremely broad, fine-grained access. And like outsider threat, you're outnumbered by your potential adversaries. (Of course, you have a bunch of advantages too, and that's why we research AI control rather than just writing off the problem as hopeless.)



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 内部威胁 外部威胁 计算机安全 AI控制
相关文章