Published on May 5, 2025 1:36 PM GMT
If the biggest threat model from AI systems comes from internal deployment, then the correct governance move is to establish independent legal supervisors for frontier AI labs[1].
Steven Adler recently argued against relying on a "race to the top," where frontier labs compete to be the safest when deploying models.
‘A race to the top can improve AI safety, but it doesn’t solve the ‘adoption problem’: getting all relevant developers to adopt safe enough practices.’
We should be sceptical of actors racing to build what could become the most powerful technology in human history and then saying they'll compete to be safe. There’s already plenty of evidence for this. Google didn’t release model evaluations for Gemini 2.5 until after they were publicly criticised. It is unclear whether they ever intended to do so. OpenAI, meanwhile, is attempting to convert its non-profit structure and has made several other choices that undermine safety.
If a serious race to the top were truly underway, we might as well go home.
But we cannot go home.
If these labs can’t be trusted, then someone else must keep them in check. So where are the checks and balances for AI labs?
Internal structures like boards or long-term governance trusts do have some power to provide oversight. But they are too far removed to catch problems early, and too reliant on information the lab chooses to share.
What is needed is a higher standard of scrutiny: deeper inspection, independent testing, and people with direct access to the models and their behaviour.
This level of oversight requires more than internal governance. It calls for an independent body with a clear legal mandate. Not friendly auditors chosen by the labs, but supervisors with the authority to inspect systems, test for risks like scheming, and intervene before deployment. They must have the power to file warrants, demand documentation, and compel access when necessary. Legislation should grant these powers and set clear boundaries for how they are used.
There is good reason to be sceptical of applying existing legal frameworks to frontier AI. Taking something off the shelf, like liability or contract law, probably will not work by default. And we should also be sceptical that the law can keep pace with how quickly this technology is moving. 'Statutory law and case law never keep up with Moore’s law.'
But that is not a reason to give up on legal approaches altogether. It is a reason to ask how the law might be reworked to meet the scale and urgency of the challenge.
In this world, the case for legal supervision becomes clearer. Supervisors should have the authority to act when necessary to keep this technology safe. That includes access to training runs, compulsory evaluations, compute-stop orders, civil fines, and the ability to seek emergency injunctive relief
A more immediate reason to introduce supervision is the shift toward internal-only deployment. Labs may choose not to release their most capable models at all. Instead, they can publish weaker systems to maintain an income stream, while using more powerful models internally to accelerate research toward superintelligence. This approach reduces legal risk, keeps competitors in the dark, and centralises control.
As deployment becomes more internal, secrecy is likely to grow. Labs may stop sharing models with external evaluators entirely. If that happens, the UK AI Security Institute will lose meaningful access. So will groups like Apollo Research and METR, whose evaluations depend on cooperation from the labs.
A deceptive model may evade detection, especially without external testing. Without independent scrutiny, dangerous behaviours can go unnoticed. Concentrating this level of capability in one organisation without oversight, poses risks not just to the lab, but to humanity too.
Apollo Research's paper addresses the risks of internal deployment and proposes two internal oversight bodies for labs that choose not to release their models publicly.
The first is the Internal Deployment Team (IDT), a technical group within the lab tasked with testing models, installing access limits and other safeguards, and monitoring their use. The second is the Internal Deployment Overseeing Board (IDOB), a board appointed by the lab to approve or block deployments, set usage rules, and audit the IDT’s enforcement.
Apollo suggests that the IDT could be made up of existing staff or new hires selected by the lab. This approach should raise concerns. It accepts that labs can supervise themselves. At best, this could be part of a broader solution. But there must also be independent supervision from the state. It is unwise to trust a system that relies on AI labs to self-regulate in matters of safety and security.
The EU AI Office offers one example of what state-level supervision could look like. Under the EU AI Act, it can demand documentation, carry out independent evaluations, mandate risk mitigation, recall or withdraw models, and issue fines of up to 15 million euros or 3 percent of global turnover.
This is a step in the right direction. But the Act only applies when a model affects people in the EU. If a system is developed and deployed entirely within a lab, without ever interacting with EU citizens, the law does not apply.
The more pressing concern is what happens when powerful models are never released. If labs keep their most advanced systems internal, many governance approaches may never be triggered. Legal frameworks built around public deployment could become irrelevant. This is why supervision needs to happen earlier. It must be proactive. It must take place during development. And it must exist in the United States, where the frontier labs are based.
Some important questions remain. How will supervisors know what to look for? How will they know when to intervene, or when to call out dangerous behaviour inside a lab?[2]
But the core claim stands. The path to superintelligence carries serious risks, and supervision is one way of trying to contain them. When labs no longer have an incentive to release their models publicly, there still needs to be a watchdog.
- ^
See Peter Wills’ paper for more on operationalising supervisors.
- ^
An additional question is what supervision looks like once advanced systems begin doing AI R&D themselves. If developers are replaced by highly capable agents improving their own designs, what exactly are supervisors overseeing? How do you supervise the agents?
Discuss