OpenAI Claims IMO Gold Medal

少点错误 07月19日 18:02

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

OpenAI最新发布的实验性推理大型语言模型在国际数学奥林匹克竞赛（IMO）中取得了金牌水平的成绩，标志着AI在解决复杂推理问题上的重大突破。该模型在严格的竞赛规则下，无需工具或互联网，独立完成了IMO 2025年的部分题目，展现了其在数学推理和长篇论证方面的强大能力。这一成就不仅体现在模型能够生成精细、无懈可击的论证，也代表了AI在推理时间跨度上的显著进步。该模型的成功得益于通用强化学习和测试时计算扩展的创新方法，而非特定的任务优化。

🌟 OpenAI的实验性推理大语言模型在国际数学奥林匹克竞赛（IMO）中取得了突破性进展，获得了金牌级别的性能。该模型在2025年IMO竞赛中，在与人类选手相同的规则下，独立完成了6道题目中的5道，总得分35/42，足以获得金牌。这标志着AI在解决需要复杂、持续创意性思考的数学问题上达到了新的高度。

💡 IMO竞赛的特点在于其题目难度高，且要求提交多页、难以验证的自然语言证明。此次AI模型能够生成精细、无懈可击的论证，证明了其在超越传统基于明确、可验证奖励的强化学习范式方面取得了进展，能够达到人类数学家级别的论证水平。模型在推理时间上也显著进步，从GSM8K的约0.1分钟，到MATH benchmark的约1分钟，再到AIME的约10分钟，最终达到IMO的约100分钟。

🚀 该模型的能力是通过通用的强化学习方法和测试时计算扩展实现的，而非针对特定任务的狭窄方法。这一方法的突破性应用，使得AI在处理数学推理和生成复杂证明方面取得了显著的进步，为未来AI在科学研究和解决复杂问题领域的应用奠定了基础。OpenAI计划在未来几个月内发布具有更强数学能力的GPT-5，但目前此IMO金牌模型仍是实验性研究模型，暂不计划发布。

📈 相较于2021年对AI数学进展的预测，此次IMO金牌的成绩远超预期。这表明AI在近年来的发展速度惊人，尤其是在推理和解决复杂问题方面。此前的预测认为在MATH benchmark上达到30%的准确率已属乐观，而实际结果已达到IMO金牌水平，凸显了AI技术发展的加速趋势。

Published on July 19, 2025 9:58 AM GMT

I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.
Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).
Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.
https://github.com/aw31/openai-imo-2025-proofs/blob/main/problem_1.txt
Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.
In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold!
Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.
Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.
If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model )
https://github.com/aw31/openai-imo-2025-proofs/

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签