How threats from internally-deployed AI compares to insider and outsider threats from humans

Published on June 23, 2025 5:47 PM GMT

I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here’s one point that I think is important.

My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from insiders”.

On the “security from outsiders” end of the spectrum, you have some security invariants you try to maintain entirely by restricting affordances with static, entirely automated systems. My sense is that this is most of how Facebook or AWS relates to its users: they want to ensure that, no matter what actions the users take on their user interfaces, they can't violate fundamental security properties. For example, no matter what text I enter into the "new post" field on Facebook, I shouldn't be able to access the private messages of an arbitrary user. And on AWS, no matter what code I run on my EC2 instance, I shouldn't be able to access someone else's private S3 bucket.

This type of security is very "static": invariants are enforced by code that doesn't involve human review and so don't rely on humans reacting properly to something. This is typically how you mitigate risks from your users--users typically only need to have certain restricted types of access to your systems, and they're very numerous, so you need to handle them with automated approaches. I sometimes call systems that aim to be safe no matter what inputs a user provides "manifestly safe".

In contrast, consider how Facebook and AWS relate to their engineers. These companies want to be robust to their engineers abusing their access to violate security properties; they could do this either by abusing administrator privileges (e.g. direct access to machines that have user data) or by adding malign behavior to code that they write. In order to mitigate these risks, companies use very different types of security techniques. They rely on multi-party authorization, where multiple engineers have to sign off on executing sensitive commands, or code review, where someone needs to review code before it's pushed to environments where it has access to sensitive data. These techniques fundamentally rely on human labor.

Insider threats are harder to handle in that your engineers need to have much more fine-grained access, but easier in that your engineers are generally less numerous than your users. Moreover, insiders don't have as much ability to iterate against your security measures (because if you catch them trying to break your security measures, you can often fire them or have them arrested).

Of course, there's a spectrum here. A big part of how you mitigate threats from external hackers is having security staff investigate weird anomalies, in the hope of catching hackers who have broken some of your intended security boundaries. And a big part of how you mitigate insider threats is building safe administrative APIs, so that instead of your employees having to have the ability to run arbitrary code on production machines, you allow them to usually do their job by taking pre-specified, known-safe actions. And there’s much more along these lines.

I feel like there’s a really important distinction between “security from insiders” and “security from outsiders”, and I don’t really get why security people don’t seem to find this distinction as important.

Now that I have this concept, I can make a few points about the risks posed by AIs deployed inside AI companies. (Risks from internal deployments have been discussed by me here (as well as many other places), by a Redwood/UK AISI collaboration here, and by Apollo here.)

First, when I talk to security staff at AI companies about computer security, they often seem to fail to anticipate what insider threat from AIs will be like. In particular, when discussing how they'll deploy AI engineers and researchers internally, they often seem to imagine relating to those AIs mostly as outsiders rather than as insiders. They imagine writing small and carefully locked-down infrastructure and allowing the AIs to interact with it. But my guess is that in order to do their jobs, AIs will need to have the kind of wide-ranging access that insiders have, which leads to having an attack surface that is large and can’t be protected with static methods. (I’m not sure I’m right about this, and it seems very important to get clarity on; I plan to write more about it soon.) I think the security teams at AI companies are not putting enough work into preparing for this.

And second, insider threat from AI combines the hard parts of both problems! Like insider threat, AIs will need extremely broad, fine-grained access. And like outsider threat, you're outnumbered by your potential adversaries. (Of course, you have a bunch of advantages too, and that's why we research AI control rather than just writing off the problem as hopeless.)

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签