少点错误 19小时前
Contest for Better AGI Safety Plans
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一项旨在应对人工智能(AI)安全风险的竞赛,特别关注AGI(通用人工智能)可能在2-4年内出现的情况。竞赛鼓励提交详细的AI安全计划,填补现有方案的不足,并评估应对策略。通过奖励、专家反馈和公开讨论,该竞赛旨在促进对AGI安全和治理措施的深入思考,并为后续工作奠定基础。

💡竞赛目标在于促进对AGI安全战略的清晰认知,通过征集各种方案来具体阐述实施细节,明确潜在的失败模式,并提出改进建议。

🏆竞赛将通过奖金、声誉和认可来激励提交,提交内容可涵盖完整计划或解决计划关键部分的战术模块,并为各种有用的提交内容留有空间。

🧐竞赛将重点关注现有计划以及问题的难点,接受突出改进或关键考虑因素的提交,尤其是在对齐和治理计划方面。

💰竞赛预算分为三个等级:最低预算5万-7.5万美元、中等预算10万美元、雄心勃勃的预算25万美元,用于奖金、运营、推广和安全储备等。

Published on July 3, 2025 5:02 PM GMT

Recently there has been some discussion around AI safety plans so I was inspired to work on this and thought it might be relevant. 

Project summary


Timelines to AGI could be very short (2-4 years) and we need solid plans to address the risks. Although some useful general outlines of plans and a few detailed scenario analyses have been produced, comprehensive plans remain underspecified. This project will establish a prestigious contest that elicits proposals which address short timeline scenarios, fill in the gaps of existing plans, and redteam AGI safety strategies. They should address strong possibilities of short AGI timelines (2-4 years), no major safety breakthroughs during that time, and uncertainty about future events. The winning proposals will receive expert feedback from the judges with the goal of refining and synthesizing the best parts, followed by broader public dissemination and discussion that can lay the foundation for follow up work. This contest will advance discussion and preparation of AGI safety and governance measures by identifying gaps and failure mode mitigations for existing strategies and theories of victory.  
 

What are this project's goals? How will you achieve them?

    Creating clarity around AGI safety strategy. This contest will elicit a wide variety of proposals for making implementation details concrete, clarifying potential failure modes, and suggesting improvements. Incentivizing submissions with prizes, prestige, and recognition. Contributions can range from full plans to tactical modules that address a key part of plans. We will ensure room for rewarding a wide variety of useful submissions. Focusing attention on existing plans and the hard parts of the problem. Specifically this contest will accept submissions that highlight improvements or crucial considerations regarding both alignment and governance plans. Two examples of strong plans that can serve as a base to start engaging with these questions are: Targeting outreach toward people well suited to build on and engage with plans. The contest will solicit submissions from experts in planning, AI safety, governance, and strategy, while also maintaining an open submission process to discover novel talent and ideas.
     

The contest will aim for submissions which are useful and adaptable:

More details can be found in Draft Submission Guidelines & Judging Criteria below. 

Why a contest for plans? 

    Contests have previously led to useful AI safety contributions: The Eliciting Latent Knowledge contest was successful at eliciting counterarguments and key considerations that the hosts hadn’t considered even after hundreds of hours of thought. They found these submissions valuable enough to pay out $275,000. Directing this kind of attention at AI safety plans when we may live in a short timeline world seems prudent and worth trying. Incentivizing planning for AGI safety, which is a distinct type of work and skillset. By default we should expect not enough planning to happen. History shows society generally underprepares for the possibility of dramatic changes (e.g. Covid-19). We also should not assume good technical or research takes are enough for sufficient strategic takes. Planning is a distinct action and skillset. It’s important to invest time specifically developing strategic clarity by specifying priorities, plan details, and limitations. When we don’t plan adequately for future scenarios, we are forced to improvise under time constraints. If we had some plans in advance of Covid-19, we probably could have done much better. We could have had an early warning system, low-cost near perfect PPE, better contact tracing, and faster vaccine manufacturing.  There are several plausible pathways for improving AI safety plans
      Some ways this might work:
        Having a bunch of people surfacing crucial considerations and assessing alternate scenarios can make such a plan more useful. You could have a plan for the right scenarios, but be missing details that would make the plan easier, more effective, or more robust. There could be good proposals for parts of the problem but it’s unclear if they are solving hard parts of the problem. We need plans to at least address those hard parts even if they aren’t all fully solved. 
         

How will this funding be used?

Budget at a glance

 

Detailed Budgets

Minimum budget ~50-75k

    Prizes 15-25k
      10-14k first3-6k second1-2k third$500 each for 10 honorable mentions.
    Operations 15-20k
      Average 15 h/week $48.76/h for 6 months (26 weeks)Probably more front loaded
        Scope out strategy, talk to previous contest hosts, set up logistics, secure judges, advertise to groups, headhunt promising participants, answer questions, coordinate dissemination strategyImprove submission guidelines, judging criteria, talk to more people and get feedback on this document.
    Logistics: ~2-5k
      Setting up submission systemWebsiteTargeted paid promotionIncludes budget for a potential award ceremony, which may be virtual
    Disseminating and marketing/popularizing winning proposals ~5-10kBudget safety buffer ~5k aka 7.5-10%Preliminary review/short list best plans for thorough judge review.
      Estimate: 100 submissions  30 min review  3 reviewers 50/hr = 7.5-10kJudge discussions to surface valuable insights as well.
    What we can do with additional funds
      Larger prizes, more honoraria for judges, marketing/disseminating the winning proposals, expanded outreach and participant support, maybe a retreat for the winners.
     

Medium budget ~100k

Same as minimum budget but with larger prizes & honoraria for judges

    Prizes 30-50k
      20-30k first6-10k second2-4k third$500 each for 12 honorable mentions.
    Honoria for judges
      Estimate: 100 submissions 45 min review 3-5 reviewers $50/hr = 18-25k

Ambitious budget ~250k

Changes from medium budget in bold

    Prizes 60-100k
      40-50k first12-24k second4-8k third$500 each for 20 honorable mentions.
    Operations 20-25k Logistics: ~15-25k
      Setting up submission systemWebsiteTargeted paid promotionRetreat for winners to further refine their proposals
    Disseminating and marketing/popularizing winning proposals ~30-50kBudget safety buffer ~25k aka 7.5-10%Honoria for judges
      Estimate: 100 submissions 45 min review 3-5 reviewers 50/hr = 18-25k
       

Who is on your team? What's your track record on similar projects?


Main Organizer

Peter Gebauer: technical AI governance research manager for ERA in Cambridge (this project is independent and unaffiliated with them); previously directed the Supervised Program for Alignment Research, completed the GovAI summer 2024 fellowship, co-authored Dynamic Safety Cases for Frontier AI, helped with recruiting at Anthropic, and managed communications for a statewide political campaign. 

Advisors

    Ryan Kidd Seth Herd


Draft Submission Guidelines

In this contest, we invite your takes on the big picture: if transformative AI is developed soon, how might the world overall navigate that, to reach good outcomes?

We think that tackling this head on could help force us to tackle the difficult and messy parts of the problem. It could help us to look at things holistically, and better align on what might be especially important. And it could help us to start to build more shared visions of what robustly good paths might look like.

Of course, any real trajectories will be highly complex and may involve new technologies that are developed as AI takes off. What is written now will not be able to capture all of that nuance. Nonetheless, we think that there is value in trying.

The format for written submissions is at your discretion, but we ask that you at least touch on each of the following points:

    Challenge: people don’t submit
      Solutions
        Prestigious judgesLots of smaller prizes to reward many types of valuable submissionsLarger prizes overall Commitment to diffusing the winning ideas and generating positive impact from them
    Challenge: bad submissions
      Solutions
        Clear criteria for useful submissions.Reach out to experienced people.Allow for multiple kinds of submissions - proposing and improving plans. Facilitate connections between people with complementary expertise and interests.
    Challenge: people might delay releasing useful work because of this
      Solution
        Include prizes for recently published work that would have scored highly in the contest.
    Bad selection of judgesChoose highly respected ones with a track record of solid publications and community standing.Diverse array of background and expertise for judging each submissionSenior researchers afraid to submit
      Framing contest as exploratory and collaborative.Non-binding participation: make it clear participation does not mean an endorsement of a particular timeline or p(doom).Private evaluation - only finalists or winners submissions become public.Ensure judges give constructive feedback.Anonymous submission option.
    Waste judges’ time
      Prescreen submissions for minimum standards based on judging criteria.
    Inconsistent judging methodology
      Standardized rubricTwo judges per finalist submission, normalize judge scores
    CoIs implicate judges
      Filter for COIsDouble-blind evaluation
       

We think this should produce submissions that will at least fill some gaps in current limited plans, and produce more collective engagement with the theoretical and practical challenges of AGI on the current trajectory. 
 

How much money have you raised in the last 12 months, and from where?

None
 

Other FAQs

Why not just get more talent to work on AI safety?

    It’s important to know how to allocate talent in order to achieve victory conditions for AGI going well. Talent is not unlimited - it is limited and must be allocated effectively. Increasing our strategic clarity is useful for knowing how to allocate talent and how the plethora of work in AI safety and governance can fit together effectively. And figuring out ways to attract, develop, and direct talent is an essential part of strategic planning. Finally, a contest can also attract new people. 

Don’t contests generally fail?

    Contests are generally high variance and useful for getting a wide variety of ideas you might not have considered. The Eliciting Latent Knowledge contest seems like the most successful AI safety contest and was able to surface many novel key considerations, paying out $275,000 total for contributions. The OpenPhilanthropy Worldview Investigations contest also resulted in some excellent analyses.  Some pitfalls of other previous contests: too short, too broad, too targeted at people who are relatively inexperienced. This contest will be higher profile than competitions like a weekend research sprint that targets AI safety newcomers with small prizes. 

Why this contest?

    Timelines to transformative AI could be short: 2-4 years or less. There are few comprehensive plans for making the development of superhuman AI systems go well. A well designed contest can encourage talented researchers to develop strong proposals we can build on and advocate for. Past contests have achieved useful submissions
      (e.g., Boaz Barack on Open Phil worldview contest, Victoria Krakovna/Mary Phuong/etc. on ELK) and identifying new talent (e.g., Quintin Pope on OP worldview, James Lucassen/Oam Patel on ELK).
    Writing down plans and sharing them helps us identify gaps and spot areas for improvement together. For a more thorough overview of extreme AI risks, we recommend reading Managing Extreme AI Risks Amid Rapid Progress

Do we expect perfect plans?

    This contest recognizes that AI safety is a complex domain where no single approach will be sufficient. Plans may not always work, but planning is essential. All submissions are understood to be exploratory contributions to a collaborative field rather than definitive solutions. Participants can choose their level of public association with their work, and all feedback will be moderated to ensure it focuses on improving plans rather than criticizing contributors.

What’s the contest timeline? 

    Once set up, we expect to allow 3 months for submissions and close submissions in late October. We will announce winners about a month after. 

What are some examples of the kind of work you're interested in?

asdExamples of Work That Provides Strategic Clarity on AI Safety Plans

    “Plans” or rather pieces of plans that we are glad exist because they provide strategic clarity, details about a key component of a plan, or requirements of a comprehensive plan:
      The Checklist: What Succeeding at AI Safety Will Involve - Sam BowmanWhat’s the Short Timeline Plan?How do we solve the alignment problem? AI Strategy NearcastingA Vision for a “CERN for AI”Building CERN for AI - An Institutional BlueprintCERN for AI Intelsat as a Model for International AGI GovernanceWhat Does it Take to Catch a Chinchilla?Survey on Intermediate Goals in AI GovernanceA sketch of an AI control safety case
    “Scenarios” we are glad exist:
      How AI Takeover Might Happen in 2 YearsA History of the Future: 2025-2040AI 2027 (Kokotajllo, et al)
    What makes these good?
      Describe a particular scenario for TAI takeoffDetails the transition from pre-TAI to post-TAILists concrete series of events or recommended actionsJustify and explain their threat models and risk factorsCites a well-defined problem (e.g. AI designed mirror pathogens)Cited problems fit with the scenario they have established
      Clear takeaways
      We should be concerned about/watch out for x
      We should do y if z happensThese fit the scenario establishedIdeally takeaways seem robustly good across multiple scenariosDetailed combinations of techniques that aid AI alignment/control

Some additional FAQs are addressed in a google doc

Why Might Timelines Might Be Short? (See Appendix A

Who would the judges be? (See Appendix B for possible ones we’d be excited about)

Possible judging criteria? (See Appendix C)



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 AGI 竞赛 风险管理
相关文章