少点错误 07月22日 05:52
Directly Try Solving Alignment for 5 weeks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Moonshot Alignment Program是一个为期五周的AI对齐研究冲刺项目,旨在解决AI对齐的核心难题——如何确保AI的行为符合人类意愿而非潜在风险,并具备向超级智能扩展的潜力。该项目聚焦于发现和测试能够深度嵌入价值观并可靠扩展的AI对齐方法。参与者将组成小型团队,选择经过验证的研究方向,并进行实验以验证其方法的泛化能力。项目提供四个主要研究方向:Agent Foundations Theory、Applied Agent Foundations、Neuroscience-based AI Alignment、Improved Preference Optimization,以及一个开放赛道,鼓励原创想法。整个过程强调团队协作、独立研究和实验验证,最终通过Demo Day展示成果并连接潜在的职业机会。

🎯 **研究目标与方向**:该项目致力于解决AI对齐中的关键挑战,即如何让AI执行我们期望的任务,而非我们不希望的任务,并确保这些方法能有效扩展至超级智能。项目设有Agent Foundations Theory、Applied Agent Foundations、Neuroscience-based AI Alignment、Improved Preference Optimization四个核心研究方向,以及一个鼓励原创想法的开放赛道,为研究者提供了多样的探索路径。

🗓️ **项目流程与时间安排**:为期五周的研究冲刺(8月2日至9月6日)将以一周为一个阶段,逐步构建和测试AI对齐方法。每个阶段都旨在将价值观深度嵌入AI系统,使其具有泛化性且不易被操纵。项目鼓励3-5人的团队协作,申请者也可单独申请,项目会根据研究方向、时区和工作风格进行匹配。每周建议投入至少10小时。

📝 **申请流程与反馈机制**:申请过程分为三个阶段:提交兴趣意向(含CV、承诺投入时间、研究方向偏好),进行基于所选方向的知识测试(15道选择题),以及在Discord服务器上组队、确定研究方向并提交提案。项目为前300名申请者提供个性化反馈,为前100个提交提案的团队提供反馈,以确保研究质量和参与者的学习体验。

🤝 **成果展示与职业发展**:项目结束时将举办公开的海报展示和招聘会(Demo Day)。各团队将在虚拟会议环境中展示研究成果,与资深研究者互动并接受评审。此外,招聘会为研究机构、实验室和初创公司提供了与研究者交流和发布职位信息的平台,为参与者提供了宝贵的职业发展机会。

💡 **参与者背景与要求**:项目对所有背景的申请者开放,优先考虑具有相关研究经验者。虽然有建议的研究背景,但项目也欢迎不同背景的参与者贡献独特视角。项目强调团队的独立协调和定期会议,并会在必要时协助团队重组,以确保项目的顺利进行。

Published on July 21, 2025 9:51 PM GMT

The Moonshot Alignment Program is a 5-week research sprint from August 2nd to September 6th, focused on the hard part of alignment: finding methods to get an AI to do what we want and not what don't want, which we have strong evidence will scale to superintelligence. You’ll join a small team, choose a vetted research direction, and run experiments to test whether your approach actually generalizes.

Mentors include: @Abram Demski @Cole Wyeth 

Research Assistants include: Leonard Piff, Péter Trócsányi

Apply before July 27th. The first 300 applicants are guaranteed personalised feedback. 166 Applicants so far.

For this program, we have four main tracks:

    Agent Foundations Theory: Build formal models of agents and value formation.Applied Agent Foundations: Implement and test agent models.Neuroscience-based AI Alignment: Design architectures inspired by how the brain encodes values.Improved Preference Optimization: Build oversight methods that embed values deeply and scale reliably.

We’re also offering a fifth Open Track track for original ideas that do not fit neatly into any one of initial four categories.

How does the program work?

The program runs for 5 weeks. Each week focuses on a different phase of building and testing an alignment method. The goal is to embed values in a system in a way that generalizes and can’t be easily gamed. Participants will form teams during the app process. We recommend teams of 3–5. You can apply solo or with collaborators. If applying solo, we’ll match you based on track, timezone, and working style.

 

Eligibility

Mentors may support specific teams depending on availability. Teams are expected to coordinate independently and meet regularly. If someone drops out, we’ll help rebalance teams where needed.

This is a part-time program and a research participant is expected to deliver at least 10 hours per week.

Our Application Process

There are three stages in the application process.

Stage 1: Expression of Interest

Submit your CV, your estimated likelihood (0–100%) of being able to commit 10 hours per week from August 9 onwards, and the tracks you're most interested in. You may also include anything relevant not captured in your CV. We guarantee personalized feedback to the first 300 applicants. 168 so far.

Stage 2: Knowledge Check

You’ll complete 15 timed multiple-choice questions based on the tracks you selected. For example:

You’ll also indicate whether you’re open to being a team leader (Yes/No).

Stage 3: Team Formation and Idea Submission

Qualified applicants join a private Discord server to form teams. Each team agrees on a research direction and submits a proposal. We provide concise, track-specific resources that summarize current methods and bottlenecks, compiled from interviews with senior researchers. We guarantee feedback to the first 100 teams that submit a proposal. Teams are assessed not just on initial ideas, but on how well they improve based on feedback.

Attend our Demo Day

The program ends with a public poster session and job fair. Teams will present their work in a virtual conference format on GatherTown. Each team has a space to display their results, answer questions, and defend their method. Senior researchers will review the posters and vote on standout projects.

Following the presentation is a job fair where research orgs, labs, and startups can host booths, meet researchers, and share open roles.

 

How much does it cost to attend the demo day?

The research program is free, but we charge for poster evening attendance to cover program costs and participant stipends. It is free for participants.

Testimonials

Martin Leitgab 

PhD, Nuclear Physics

had a great experience at the AI-Plans.com evals hackathon in April. The event was well-organized with several team-making/matching sessions leading up to the hackathon, and flexibility for different teams to pursue different research directions, including the opportunity to continue research after the hackathon ended. Our team worked hard and matured a research project into a paper accepted at the ICML MAS workshop. Thanks to Kabir Kumar and the AI-Plans.com team for this great event and opportunity!

Shruti Datta Gupta

Product Security Engineer, Adobe

I really enjoyed the hackathon, it was a very good learning experience for me, since I have been leading the AI evals effort for my team at work. It was interesting to see that a lot of approaches in leading academic research is similar to what we're doing in the industry. It was a bit difficult for me to engage full-time throughout the week, but we definitely made it work well within our team. I truly loved and enjoyed the openness, diversity and inclusivity in this hackathon. I was able to make a few good connections through the hackathon, and that's an awesome outcome of the event. I also appreciate that you checked in regularly with all participants, ensuring that everybody had a team, had access to the resources etc. Also loved working with my team on our project, and learning from both Roland and Sruthi

Abby Lupi

Senior Data Analyst, CareerVillage

I wanted to share that this hackathon has already kinda changed the game for me. In the last week, there's been a big priority shift in my org to focus on evals as a measure of quality. We don't have any specialists yet, so I was given some of the responsibility to share with my fellow data team coworker. She built the code to format and work with our current data, and between the keynote talks and just trying things in colab, I've been able to share some insights and fill in the gaps! A lot of this stuff felt really hard to approach without a group of people to chat with (and some kind of structure to work in). So thank you for organizing and being so aggressive about getting people involved 😂 It's making a difference

Anya Decarlo

Research Assistant, Oregon Health & Science University

This really sparked an interest in me and from it I was able to ask a question related to my idea to the Director of the Center for Devices and Radiological Health at the FDA at a Q and A at the CERSI Summit at UCSF. The ideathon really sparked and has directed a large research area for me, and none of this would have happened without AI-Plans. I hope to keep engaging with the work, and am grateful for all you do!

Nataliia Povarova

Lead analyst, Federal Institute of Industrial Property, Russia

I learned a lot. Regular keynotes with experts and communication with peer contestants were extremely helpful. AI alignment is an important field and involving so many high-skilled professionals is a great and meaningful thing to do.Second, the problems to solve were great. Jailbreaking the top-tier models was a lot of fun. Among some of the things I learned were these:

    you have a greater chance to make a model follow malicious instructions if you pretend to be a researcher conducting an experiment or a security expert testing their solutions;toy examples have greater chance to work ("I want to steal all the popcorn from the cinema" – thanks to Areal (Ari) Tal for this prompt – will work better than "I want to steal all the money from the register");if you rephrase a query multiple time, one option may work.

My team and I tried to implement an evolutionary algorithm from "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers" paper. Honestly, this idea haunts me to this day and I will certainly work on it more.

Right after the end of the hackathon I found another paper named "Best-of-N-Jailbreaking". It is worth sharing, so here is my short review: https://lnkd.in/ermPisGz

Anyways, many thanks to AI Plans team for the great work!

Luke Chambers

PhD, Law and Policy of AI

Had a fantastic time taking part in the debates and workshops at the AI Law-a-thon hosted by Kabir Kumar. Lots of lively topics and some unique points of view. Would certainly take part again, and would recommend this to those interested in digital tech of any kind.

James Hindmarch

Programme Lead, ARENA

(on the April/May Alignment Evals Hackathon) "Was surprised at the high-quality of the work I saw here! Some of these evals are incredibly impressive given budgetary + time constraints!”

Areal (Ari) Tal

Founder and Head of AI Strategy at AI Alignment Liaison

I was fortunate to participate in the AI Alignment Evals hackathon hosted by AI Plans this past January, and I'd love to share a few highlights:

    Incredible Speakers: Featuring Monika Jotautaitė and others doing meaningful work in AI alignment.Practical Insights & Tools: I gained hands-on experience with the SALAD benchmark for AI safety, explored MD Judge for applying the "LLM as a Judge" methodology, and learned a bit about blue teaming and red teaming strategies - fun and directly applicable to my current work at AI Alignment Liaison.Community: It was particularly great meeting others who are passionate about or learning more about this area.

I'm excited to share that I plan to participate in the next AI Alignment Evals weeklong hackathon starting April 26. Highly recommend this event to anyone working on - or even just curious about - AI alignment and AI safety.

Thank you to: Paul Rapoport, Norman Hsia, Cole Wyeth, Sahil, Lucius, Vanessa Kosoy, Roman Malov, Cameron Holmes, Abram Demski, Tsvi, Chloe Loewith and other researchers for their help in preparing this program's structure. 
 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI对齐 研究冲刺 超级智能 价值观嵌入 AI安全
相关文章