Contest for Better AGI Safety Plans

Published on July 3, 2025 5:02 PM GMT

Recently there has been some discussion around AI safety plans so I was inspired to work on this and thought it might be relevant.

Project summary

Timelines to AGI could be very short (2-4 years) and we need solid plans to address the risks. Although some useful general outlines of plans and a few detailed scenario analyses have been produced, comprehensive plans remain underspecified. This project will establish a prestigious contest that elicits proposals which address short timeline scenarios, fill in the gaps of existing plans, and redteam AGI safety strategies. They should address strong possibilities of short AGI timelines (2-4 years), no major safety breakthroughs during that time, and uncertainty about future events. The winning proposals will receive expert feedback from the judges with the goal of refining and synthesizing the best parts, followed by broader public dissemination and discussion that can lay the foundation for follow up work. This contest will advance discussion and preparation of AGI safety and governance measures by identifying gaps and failure mode mitigations for existing strategies and theories of victory.

What are this project's goals? How will you achieve them?

Creating clarity around AGI safety strategy.

Incentivizing submissions with prizes, prestige, and recognition.

Focusing attention on existing plans and the hard parts of the problem.

Google’s A Technical Approach to AGI Safety and Security

MIRI’s AI Governance to Avoid Extinction

Targeting outreach toward people well suited to build on and engage with plans.

The contest will aim for submissions which are useful and adaptable:

Concrete

Modular

Forecast-aware

Counterfactually useful

More details can be found in Draft Submission Guidelines & Judging Criteria below.

Why a contest for plans?

Contests have previously led to useful AI safety contributions:

Incentivizing planning for AGI safety, which is a distinct type of work and skillset.

distinct action and skillset

There are several plausible pathways for improving AI safety plans

How will this funding be used?

Budget at a glance

Detailed Budgets

Minimum budget ~50-75k

10-14k first3-6k second1-2k third$500 each for 10 honorable mentions.

$48.76/h for 6 months (26 weeks)Probably more front loaded
Scope out strategy, talk to previous contest hosts, set up logistics, secure judges, advertise to groups, headhunt promising participants, answer questions, coordinate dissemination strategyImprove submission guidelines, judging criteria, talk to more people and get feedback on this document.

Logistics: ~2-5k
Setting up submission systemWebsiteTargeted paid promotionIncludes budget for a potential award ceremony, which may be virtual
Disseminating and marketing/popularizing winning proposals ~5-10kBudget safety buffer ~5k aka 7.5-10%Preliminary review/short list best plans for thorough judge review.

Estimate: 100 submissions 30 min review 3 reviewers

Larger prizes, more honoraria for judges, marketing/disseminating the winning proposals, expanded outreach and participant support, maybe a retreat for the winners.

Medium budget ~100k

Same as minimum budget but with larger prizes & honoraria for judges

Prizes 30-50k

20-30k first6-10k second2-4k third$500 each for 12 honorable mentions.

Honoria for judges

45 min review

3-5 reviewers

$50/hr = 18-25k

Ambitious budget ~250k

Changes from medium budget in bold

Prizes 60-100k
40-50k first12-24k second4-8k third$500 each for 20 honorable mentions.
Operations 20-25k Logistics: ~15-25k
Setting up submission systemWebsiteTargeted paid promotionRetreat for winners to further refine their proposals
Disseminating and marketing/popularizing winning proposals ~30-50kBudget safety buffer ~25k aka 7.5-10%Honoria for judges

Estimate: 100 submissions

3-5 reviewers

Who is on your team? What's your track record on similar projects?

Main Organizer

Peter Gebauer: technical AI governance research manager for ERA in Cambridge (this project is independent and unaffiliated with them); previously directed the Supervised Program for Alignment Research, completed the GovAI summer 2024 fellowship, co-authored Dynamic Safety Cases for Frontier AI, helped with recruiting at Anthropic, and managed communications for a statewide political campaign.

Advisors

Ryan Kidd Seth Herd

Draft Submission Guidelines

In this contest, we invite your takes on the big picture: if transformative AI is developed soon, how might the world overall navigate that, to reach good outcomes?

We think that tackling this head on could help force us to tackle the difficult and messy parts of the problem. It could help us to look at things holistically, and better align on what might be especially important. And it could help us to start to build more shared visions of what robustly good paths might look like.

Of course, any real trajectories will be highly complex and may involve new technologies that are developed as AI takes off. What is written now will not be able to capture all of that nuance. Nonetheless, we think that there is value in trying.

The format for written submissions is at your discretion, but we ask that you at least touch on each of the following points:

200–400 words executive summaryHow the technical difficulties of first building safe transformative AI are surmountedHow the socio-political difficulties of first building safe transformative AI are surmountedWhat transformative applications are seen first, and how the world evolvesHow the challenges of the-potential-for-an-intelligence-explosion are surmounted (either by navigating an explosion safely, or by coordinating to avoid an explosion)Biggest uncertainties and weak points of the planWhat are the most likely causes and outcomes if this project fails?

Prestigious judgesLots of smaller prizes to reward many types of valuable submissionsLarger prizes overall Commitment to diffusing the winning ideas and generating positive impact from them

Clear criteria for useful submissions.Reach out to experienced people.Allow for multiple kinds of submissions - proposing and improving plans. Facilitate connections between people with complementary expertise and interests.

Include prizes for recently published work that would have scored highly in the contest.

Framing contest as exploratory and collaborative.Non-binding participation: make it clear participation does not mean an endorsement of a particular timeline or p(doom).Private evaluation - only finalists or winners submissions become public.Ensure judges give constructive feedback.Anonymous submission option.

Prescreen submissions for minimum standards based on judging criteria.

Standardized rubricTwo judges per finalist submission, normalize judge scores

We think this should produce submissions that will at least fill some gaps in current limited plans, and produce more collective engagement with the theoretical and practical challenges of AGI on the current trajectory.

How much money have you raised in the last 12 months, and from where?

None

Other FAQs

Why not just get more talent to work on AI safety?

It’s important to know how to allocate talent in order to achieve victory conditions for AGI going well. Talent is not unlimited - it is limited and must be allocated effectively. Increasing our strategic clarity is useful for knowing how to allocate talent and how the plethora of work in AI safety and governance can fit together effectively. And figuring out ways to attract, develop, and direct talent is an essential part of strategic planning. Finally, a contest can also attract new people.

Don’t contests generally fail?

Contests are generally high variance and useful for getting a wide variety of ideas you might not have considered. The Eliciting Latent Knowledge contest seems like the most successful AI safety contest and was able to surface many novel key considerations, paying out $275,000 total for contributions. The OpenPhilanthropy Worldview Investigations contest also resulted in some excellent analyses. Some pitfalls of other previous contests: too short, too broad, too targeted at people who are relatively inexperienced. This contest will be higher profile than competitions like a weekend research sprint that targets AI safety newcomers with small prizes.

Why this contest?

(e.g., Boaz Barack on Open Phil worldview contest, Victoria Krakovna/Mary Phuong/etc. on ELK) and identifying new talent (e.g., Quintin Pope on OP worldview, James Lucassen/Oam Patel on ELK).

Managing Extreme AI Risks Amid Rapid Progress

Do we expect perfect plans?

This contest recognizes that AI safety is a complex domain where no single approach will be sufficient. Plans may not always work, but planning is essential. All submissions are understood to be exploratory contributions to a collaborative field rather than definitive solutions. Participants can choose their level of public association with their work, and all feedback will be moderated to ensure it focuses on improving plans rather than criticizing contributors.

What’s the contest timeline?

Once set up, we expect to allow 3 months for submissions and close submissions in late October. We will announce winners about a month after.

What are some examples of the kind of work you're interested in?

asdExamples of Work That Provides Strategic Clarity on AI Safety Plans

pieces

strategic clarity

details

or requirements

comprehensive plan

The Checklist: What Succeeding at AI Safety Will Involve - Sam Bowman

What’s the Short Timeline Plan?

How do we solve the alignment problem?

AI Strategy Nearcasting

A Vision for a “CERN for AI”

Building CERN for AI - An Institutional Blueprint

CERN for AI

Intelsat as a Model for International AGI Governance

What Does it Take to Catch a Chinchilla?

Survey on Intermediate Goals in AI Governance

A sketch of an AI control safety case

How AI Takeover Might Happen in 2 Years

A History of the Future: 2025-2040

AI 2027 (Kokotajllo, et al)

Describe a particular scenario for TAI takeoffDetails the transition from pre-TAI to post-TAILists concrete series of events or recommended actionsJustify and explain their threat models and risk factorsCites a well-defined problem (e.g. AI designed mirror pathogens)Cited problems fit with the scenario they have established

Clear takeaways

We should be concerned about/watch out for x

We should do y if z happensThese fit the scenario establishedIdeally takeaways seem robustly good across multiple scenariosDetailed combinations of techniques that aid AI alignment/control

Some additional FAQs are addressed in a google doc

Why Might Timelines Might Be Short? (See Appendix A)

Who would the judges be? (See Appendix B for possible ones we’d be excited about)

Possible judging criteria? (See Appendix C)

Discuss

What are this project's goals? How will you achieve them?

Why a contest for plans?

How will this funding be used?

Budget at a glance

Detailed Budgets

Minimum budget ~50-75k

Medium budget ~100k

Ambitious budget ~250k

Who is on your team? What's your track record on similar projects?

Draft Submission Guidelines

How much money have you raised in the last 12 months, and from where?

Other FAQs

Why not just get more talent to work on AI safety?

Don’t contests generally fail?

Why this contest?

Do we expect perfect plans?

What’s the contest timeline?

What are some examples of the kind of work you're interested in?

Some additional FAQs are addressed in a google doc

Why Might Timelines Might Be Short? (See Appendix A)

Who would the judges be? (See Appendix B for possible ones we’d be excited about)

Possible judging criteria? (See Appendix C)

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签

What are this project's goals? How will you achieve them?

Why a contest for plans?

How will this funding be used?

Budget at a glance

Detailed Budgets

Minimum budget ~50-75k

Medium budget ~100k

Ambitious budget ~250k

Who is on your team? What's your track record on similar projects?

Draft Submission Guidelines

How much money have you raised in the last 12 months, and from where?

Other FAQsWhy not just get more talent to work on AI safety?

Don’t contests generally fail?

Why this contest?

Do we expect perfect plans?

What’s the contest timeline?

What are some examples of the kind of work you're interested in?

Some additional FAQs are addressed in a google doc

Why Might Timelines Might Be Short? (See Appendix A)

Who would the judges be? (See Appendix B for possible ones we’d be excited about)

Possible judging criteria? (See Appendix C)

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签

Other FAQs

Why not just get more talent to work on AI safety?