MarkTechPost@AI 02月11日
Google DeepMind Introduces AlphaGeometry2: A Significant Upgrade to AlphaGeometry Surpassing the Average Gold Medalist in Solving Olympiad Geometry
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AlphaGeometry2 (AG2)是AlphaGeometry (AG1)的重大升级,在解决国际数学奥林匹克(IMO)几何问题方面超越了平均水平的金牌获得者。AG2通过扩展其领域语言,改进了对复杂几何概念的处理,将其对IMO问题的覆盖率从66%提高到88%。AG2集成了基于Gemini的语言模型、更高效的符号引擎以及具有知识共享功能的新型搜索算法。这些改进使其在2000-2024年IMO几何问题上的求解率提高到84%。此外,AG2还朝着从自然语言解释问题的完全自动化系统迈进。

💡AlphaGeometry2 (AG2) 是对 AlphaGeometry (AG1) 的重大改进,通过整合语言模型和符号推理引擎,显著提升了解决国际数学奥林匹克 (IMO) 几何问题的能力。

📐AG2 扩展了其领域语言,引入了额外的谓词来处理线性方程、运动和常见几何问题,从而将 IMO 几何问题的覆盖率从 66% 提高到 88% (2000–2024)。

🔍AG2 集成了基于 Gemini 的语言模型、更高效的符号引擎和一种新的知识共享搜索算法,使其在 2000-2024 年的 IMO 几何问题上的求解率提高到 84%,超越了平均金牌获得者。

🤖AG2 还在自动化形式化方面取得了进展,借助基础模型将自然语言问题转化为 AG 语法,并采用两阶段优化方法生成非构造性问题的图表。

The International Mathematical Olympiad (IMO) is a globally recognized competition that challenges high school students with complex mathematical problems. Among its four categories, geometry stands out as the most consistent in structure, making it more accessible and well-suited for fundamental reasoning research. Automated geometry problem-solving has traditionally followed two primary approaches: algebraic methods, such as Wu’s method, the Area method, and Gröbner bases, and synthetic techniques, including Deduction databases and the Full angle method. The latter aligns more closely with human reasoning and is particularly valuable for broader research applications.

Previous research introduced AlphaGeometry (AG1), a neuro-symbolic system designed to solve IMO geometry problems by integrating a language model with a symbolic reasoning engine. From 2000 to 2024, AG1 achieved a 54% success rate on the issues, marking a significant step in automated problem-solving. However, its performance was hindered by limitations in its domain-specific language, the efficiency of its symbolic engine, and the capability of its initial language model. These constraints prevented AG1 from surpassing its current accuracy despite its promising approach.

AlphaGeometry2 (AG2) is a major advancement over its predecessor, surpassing the problem-solving abilities of an average IMO gold medalist. Researchers from Google DeepMind, the University of Cambridge, Georgia Tech, and Brown University expanded its domain language to handle complex geometric concepts, improving its coverage of IMO problems from 66% to 88%. AG2 integrates a Gemini-based language model, a more efficient symbolic engine, and a novel search algorithm with knowledge sharing. These enhancements boost its solving rate to 84% on IMO geometry problems from 2000-2024. Additionally, AG2 advances toward a fully automated system that interprets problems from natural language.

AG2 expands the AG1 domain language by introducing additional predicates to address limitations in expressing linear equations, movement, and common geometric problems. It enhances coverage from 66% to 88% of IMO geometry problems (2000–2024). AG2 supports new problem types, such as locus problems, and improves diagram formalization by allowing points to be defined using multiple predicates. Automated formalization, aided by foundation models, translates natural language problems into AG syntax. Diagram generation employs a two-stage optimization method for non-constructive problems. AG2 also strengthens its symbolic engine, DDAR, for faster and more efficient deduction closure, enhancing proof search capabilities.

AlphaGeometry2 achieves a high solve rate on IMO geometry problems from 2000–2024, solving 42 out of 50 in the IMO-AG-50 benchmark, surpassing an average gold medalist. It also solves all 30 hardest formalizable IMO shortlist problems. Performance improves rapidly, solving 27 problems after 250 training steps. Ablation studies reveal optimal inference settings. Some issues remain unsolved due to unformalizable conditions or a lack of advanced geometry techniques in DDAR. Experts find its solutions highly creative. Despite limitations, AlphaGeometry2 outperforms AG1 and other systems, demonstrating state-of-the-art capabilities in automated problem-solving. 

In conclusion, AlphaGeometry2 significantly improves upon its predecessor by incorporating a more advanced language model, an enhanced symbolic engine, and a novel proof search algorithm. It achieves an 84% solve rate on 2000–2024 IMO geometry problems, surpassing the previous 54%. Studies reveal that language models can generate full proofs without external tools, and different training approaches yield complementary skills. Challenges remain, including limitations in handling inequalities and variable points. Future work will focus on subproblem decomposition, reinforcement learning, and refining auto-formalization for more reliable solutions. Continued improvements aim to create a fully automated system for solving geometry problems efficiently.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

The post Google DeepMind Introduces AlphaGeometry2: A Significant Upgrade to AlphaGeometry Surpassing the Average Gold Medalist in Solving Olympiad Geometry appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AlphaGeometry2 IMO几何问题 AI解题 深度学习
相关文章