Published on May 1, 2025 12:33 AM GMT
To make frontier AI safe enough, we need to "lift up the floor" with minimum safety practices
Anthropic has popularized the idea of a “race to the top” in AI safety: Show you can be a leading AI developer while still prioritizing safety. Make safety a competitive differentiator, which pressures other developers to be safe too. Spurring a race to the top is core to Anthropic’s mission, according to its co-founder and CTO.1
Is the race to the top working?
Competitive pressures can lead to some safety improvements.
When models aren’t reliable or trustworthy, customers get upset, and this creates pressure to fix problems. The CEO tweets on a Sunday, “we are working on fixes asap, some today.”
Currently both OpenAI and Anthropic are dealing with trustworthiness issues with their models, unwanted by either company.2 And once one finds a fix, competitive pressure will mount on the other to quickly find a fix as well.
So, does that mean we can count on a race to the top to keep us all safe?
Not really. “Creating pressure to be safer” is different from “Making sure nobody acts unsafely.”
A race to the top can improve AI safety, but it doesn’t solve the “adoption problem”—getting all relevant developers to adopt safe enough practices.
For instance, Anthropic often cites its safety framework as evidence of how a race to the top can work—but Meta didn’t publish their own safety framework until 17 months later. If AI safety is a race to the top, it’s not a very fast one.
To Anthropic’s credit, they have not claimed that a race to the top is sufficient for AI safety, at least not to my knowledge. But media coverage sometimes suggests otherwise3 —perhaps because Anthropic doesn’t have a paired phrase that emphasizes the need for regulation.
It’s an important point, and so it bears saying clearly:
A “race to the top” must be paired with “lifting up the floor.” As AI systems become more capable, it is dangerous to rely on competitive pressures for getting frontier AI developers to adopt safe enough practices.4
In this post, I will:
- Define Anthropic’s philosophy of “race to the top” (including an Appendix with detailed excerpts from Anthropic)Explain why frontier AI safety relies on “lifting up the floor”Explain why we shouldn’t rely on a “race to the top” being sufficient
- Market forces won’t lead to universal adoption of safety practicesThe safety practices desired by the market won’t be strong enough to stop catastrophic risks
Continues here: https://stevenadler.substack.com/p/dont-rely-on-a-race-to-the-top
Twitter thread: https://x.com/sjgadler/status/1917649337519333612
Discuss