MarkTechPost@AI 18小时前
ByteDance Introduces Seed-Prover: An Advanced Formal Reasoning System for Automated Mathematical Theorem Proving
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章介绍了字节跳动Seed团队开发的Seed-Prover系统,该系统是一种基于Lean形式语言的引理式全证明推理模型,通过迭代优化和测试时推理策略,显著提升了数学定理证明的自动化水平。Seed-Prover在IMO等数学竞赛和基准测试中取得了state-of-the-art的成果,例如在IMO 2025中解决了6道题中的5道。此外,文章还提出了Seed-Geometry,一个专门的几何推理引擎,以克服Lean在处理几何支持方面的局限性。该系统通过结合形式语言和大型语言模型,为数学研究和教育提供了强大的新工具。

💡 Seed-Prover是一种创新的引理式全证明推理模型,它以引理为中心,通过迭代细化和Lean反馈,以及自我总结来优化数学证明。这种方法区别于传统的逐步生成或整体生成方式,能够处理更复杂的推理过程。

🚀 配合Seed-Geometry几何推理引擎,Seed-Prover在处理包含几何内容的数学问题时表现出色,有效弥补了Lean在几何支持方面的不足。这种组合使得系统在解决如IMO竞赛中的几何题目时更具优势。

🏆 Seed-Prover在多个数学基准测试中取得了领先的成绩,包括在IMO 2025中成功解决5/6道题目,以及在过往IMO题目中达到78.1%的成功率。在MiniF2F和PutnamBench等测试中也显示出显著的性能提升,证明了其在不同难度和领域的数学推理能力。

⚙️ 该系统利用多阶段、多任务的强化学习(基于VAPO)进行训练,并结合了专有的测试时推理策略。训练数据集包含公开和内部数据,并经过优化以排除过于简单的任务,确保模型专注于解决更具挑战性的问题。

LLMs have shown notable improvements in mathematical reasoning by extending through natural language, resulting in performance gains on benchmarks such as MATH and AIME. However, reinforcement learning (RL) for training these models encounters a challenge: verifying the correctness of natural language proofs is very difficult, requiring careful manual checking of each reasoning step. This limits the application of RL for training mathematical theorem-proving models. While formal languages like Lean offer automatic correctness verification, current LLM formal provers face their limitations. Step-level provers generate code incrementally but require special scaffolding and lack high-level reasoning capabilities.

ByteDance Seed Team introduces Seed-Prover, a lemma-style whole-proof reasoning model. It refines proofs iteratively using Lean feedback, previously established lemmas, and self-summarization. Seed-Prover employs three specialized test-time inference strategies that allow deep and broad reasoning methods to solve IMO-level contest problems. Its primary innovation is in adopting lemma-style proving as its core method, placing lemmas at the center of the reasoning process rather than relying on traditional step-by-step or whole-proof generation methods. Moreover, this paper introduces Seed-Geometry,  a complementary geometry reasoning engine that overcomes Lean’s limitations in handling geometric support.

For interaction between Seed-Prover and Lean, multi-stage, multi-task RL based on VAPO is utilized. The training dataset combines open-source datasets with in-house formal problems, using a proposer to create simpler variants of difficult tasks. It excludes overly simple problems with proof rates above 25%. Seed-Geometry’s backend supports large-scale problem generation, identifying over 230 million unique problems across seven days with an eightfold improvement in search efficiency. A separate policy and value model is trained, though extensive testing shows that value models may reduce performance due to estimation errors. As a result, step-by-step generation with beam search is adopted in distributed setups.

Seed-Prover achieves state-of-the-art results across multiple mathematical benchmarks. For IMO 2025, Seed-Prover fully solves 5 out of 6 problems, with Seed-Geometry instantly solving Problem 2 and Seed-Prover deriving proofs for the remaining problem using various inference settings. On past IMO problems, it proved 121 out of 155 tasks, achieving a 78.1% success rate across all difficulty levels. The performance breakdown shows strong results across problem categories: solving 47 out of 55 easy problems, 47 out of 56 medium problems, and 27 out of 44 hard problems, with subject-specific success rates including 72 out of 85 in algebra, 42 out of 55 in number theory, and 7 out of 14 in combinatorics.

On MiniF2F, researchers achieve a 99.6% proof rate for both validation and test sets under medium settings, solving difficult problems such as IMO 1990 P3. PutnamBench results show improvement from 201 to 331 solved problems out of 657 when upgrading from light to medium inference settings, showing a significant performance jump over previous undergraduate-level mathematical reasoning systems. On CombiBench, Seed-Prover solves 30 out of 100 combinatorics problems, outperforming existing methods but revealing continued challenges in combinatorial reasoning. Researchers achieve 81.8% success on MiniCTX-v2, showing strong generalization beyond competition problems and outperforming the o4-mini baseline’s 44.3% at Pass@8.

In conclusion, ByteDance Seed presents Seed-Geometry and Seed-Prover, two formal reasoning methods that integrate the capabilities of LLMs. Seed-Geometry provides accelerated verification and enhanced search mechanisms while Seed-Prover utilizes iterative refinement and complex test-time inference strategies. The achievement of solving 5 out of 6 problems in the IMO 2025 shows the practical efficacy of these methods in tackling elite mathematical competitions. The adoption of formal languages like Lean provides rapid proof verification that is more cost-effective than human experts and more reliable than LLM-based judges. Future research will focus on combining formal systems with LLMs to address open conjectures.


Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post ByteDance Introduces Seed-Prover: An Advanced Formal Reasoning System for Automated Mathematical Theorem Proving appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Seed-Prover 数学证明 人工智能 形式语言 定理证明
相关文章