Published on May 28, 2025 4:00 PM GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: Google released a frontier video generation model at its annual developer conference; Anthropic’s Claude Opus 4 demonstrates the danger of relying on voluntary governance.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe to receive future versions.

Google Releases Veo 3

Last week, Google made several AI announcements at I/O 2025, its annual developer conference. An announcement of particular note is Veo 3, Google’s newest video generation model.

Frontier video and audio generation. Veo 3 outperforms other models on human preference benchmarks, and generates both audio and video.

Google showcasing a video generated with Veo 3. (Source)

If you just look at benchmarks, Veo 3 is a substantial improvement over other systems. But relative benchmark improvement only tells part of the story—the absolute capabilities of systems ultimately determine their usefulness. Veo 3 looks like a marked qualitative improvement over other models—it generates video and audio with extreme faithfulness, and we recommend you see some examples for yourself. Veo 3 may represent the point video generation crosses the line between being an interesting toy and being genuinely useful.

Other announcements at I/O 2025. Other highlights from the conference include:

AI is here to stay. AI use is sometimes driven by trends—for example, ChatGPT added a million users in an hour during the ‘Ghiblification’ craze. However, as AI systems become genuinely useful across more tasks, they will become ubiquitous and enduring. Google’s Gemini app now has 400M monthly active users, and its AI products now process over 480 trillion tokens a month—up from 9.7 trillion last year.

Opus 4 Demonstrates the Fragility of Voluntary Governance

Last week, Anthropic released Claude Opus 4 and Claude Sonnet 4. Both exhibit broadly frontier performance, and lead the field on coding benchmarks. Claude Opus 4 is also Anthropic’s first model to meet its ASL-3 safety measure, which designates models that pose substantial risk. However, Anthropic rolled back several safety and security commitments prior to releasing Opus 4, demonstrating that voluntary governance is not to be relied on.

Opus 4 exhibits hazardous dual-use capabilities. In one result from its system card, Opus 4 provides a clear uplift in trials measuring its ability to help malicious actors acquire biological weapons.

Anthropic’s Chief Scientist Jarad Kaplan told TIME that malicious actors could use Opus 4 to “try to synthesize something like COVID or a more dangerous version of the flu—and basically, our modeling suggests that this might be possible.” It’s not just Opus 4: several frontier models outperform human experts in dual-use virology tests.

The system card also reports that Apollo Research found an early Claude Opus 4 version exhibited "scheming and deception," advising against its release. Anthropic says it implemented internal fixes; however, it doesn’t appear that Anthropic had Apollo Research re-evaluate the final, released version.

Anthropic’s safety protections may be insufficient. In light of Opus 4’s dangerous capabilities, Anthropic rolled out ASL-3 safety protections. However, early public response to Opus 4 indicates that those protections might be insufficient. For example, one researcher showed that Claude Opus 4's WMD safeguards can be bypassed to generate over 15 pages of detailed instructions for producing sarin gas.

Anthropic walked back safety and security commitments prior to Opus 4’s release. Anthropic has also faced criticism for walking back safety commitments prior to Opus 4’s release. For example, Anthropic’s September 2023 Responsible Scaling Policy (RSP) committed to define detailed ASL-4 "warning sign evaluations" before their systems reached ASL-3 capabilities; however, it hadn’t done so at the time of Opus 4’s release. This is because Anthropic redlined that requirement in an October 2024 revision to its RSP.

Anthropic also weakened its ASL-3 security requirements shortly before Opus 4's ASL-3 announcement, specifically no longer requiring robustness against employees stealing model weights if they already had access to "systems that process model weights."

Voluntary governance is fragile. Whether or not Anthropic’s changes to its safety and security policies are justified, voluntary commitments are not sufficient to ensure model releases are safe. There’s nothing stopping Anthropic or other AI companies from walking back critical commitments in the face of competitive pressure to rush releases.

Government

Industry

Civil Society