MarkTechPost@AI 05月02日 03:55
DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeek-AI发布了DeepSeek-Prover-V2,一款开源大型语言模型,专注于通过子目标分解和强化学习进行形式化定理证明。该模型结合了DeepSeek-V3的自然语言推理能力和7B参数规模的证明器,实现了在MiniF2F-test等基准测试上的显著成果。DeepSeek-Prover-V2采用冷启动训练策略,完全基于合成数据,并通过课程学习和一致性奖励机制提升证明准确性,为AI在数学领域的应用提供了新的可能性。

🧠DeepSeek-Prover-V2的核心在于结合自然语言推理与形式化证明。它利用DeepSeek-V3生成自然语言证明草稿,再由7B参数规模的证明器将草稿转化为形式化的定理证明,从而弥合了非形式推理与形式推理之间的差距。

💡该模型采用了子目标分解策略,将复杂的定理分解为更易于处理的子目标。每个子目标都转化为Lean 4中的“have”语句,并通过递归方式解决,降低了计算成本,同时保持了形式上的严谨性。

📈DeepSeek-Prover-V2的训练完全基于合成数据,无需人工标注的证明步骤,并通过课程学习框架逐步增加训练任务的复杂性。模型还采用了两种类型的子目标分解,并结合一致性奖励系统,显著提高了证明的准确性。

🏆在多个基准测试中,DeepSeek-Prover-V2表现出色。在MiniF2F-test上取得了88.9%的通过率,在PutnamBench上解决了49个问题,并在AIME 2024-2025的竞赛中解决了6个问题,展现了其在实际问题中的应用潜力。

Formal mathematical reasoning has evolved into a specialized subfield of artificial intelligence that requires strict logical consistency. Unlike informal problem solving, which allows for intuition and loosely defined heuristics, formal theorem proving relies on every step being fully described, precise, and verifiable by computational systems. Proof assistants, such as Lean, Coq, and Isabelle, serve as the structural frameworks within which these formal proofs are constructed. Their operation demands logical soundness with no space for omissions, approximations, or unstated assumptions. This makes the challenge particularly demanding for AI systems, especially large language models, which excel in producing coherent natural language responses but typically lack the rigor to produce verifiable formal proofs. However, the desire to blend these strengths, AI’s fluency in informal reasoning and the structure of formal verification, has led to new innovations at the interface of language modeling and formal logic automation.

A major issue arises from the inability of current language models to bridge the conceptual divide between informal and formal reasoning. Language models typically excel at generating human-like explanations and solving math problems written in natural language. However, this reasoning is inherently informal and often lacks the structural precision required by formal logic systems. While humans can intuitively leap from one deductive step to another, proof assistants require a fully specified sequence of steps, free of ambiguity. Thus, the challenge is to guide AI models to produce logically coherent formal outputs from their otherwise informal and intuitive internal reasoning processes. This problem becomes increasingly complex when handling advanced theorems from domains such as number theory or geometry, where precision is crucial.

Recent efforts have attempted to address this issue by guiding models first to generate natural language proof sketches, which are then manually or semi-automatically translated into formal proof steps. A known strategy includes decomposing a complex theorem into smaller subgoals. Each subgoal represents a lemma that can be tackled independently and later combined to form a complete proof. Frameworks like “Draft, Sketch, and Prove” have applied this idea, using language models to generate proof outlines that are then translated into formal language. Another method employs hierarchical reinforcement learning, breaking down complex mathematical problems into simpler layers. However, these models often struggle to produce fully verifiable outputs in Lean or Coq environments. Moreover, the training data for these models is usually limited, and proof attempts frequently fail to yield successful outcomes that provide useful learning signals.

A team of researchers from DeepSeek-AI has introduced a new model, DeepSeek-Prover-V2, designed to generate formal mathematical proofs by leveraging subgoal decomposition and reinforcement learning. The core of their approach utilizes DeepSeek-V3 to break down a complex theorem into manageable subgoals, each of which is translated into a “have” statement in Lean 4 with a placeholder indicating that the proof is incomplete. These subgoals are then passed to a 7B-sized prover model that completes each proof step. Once all steps are resolved, they are synthesized into a complete Lean proof and paired with the original natural language reasoning generated by DeepSeek-V3. This forms a rich cold-start dataset for reinforcement learning. Importantly, the model’s training is entirely bootstrapped from synthetic data, with no human-annotated proof steps used.

The cold-start pipeline begins by prompting DeepSeek-V3 to create proof sketches in natural language. These sketches are transformed into formal theorem statements with unresolved parts. A key innovation lies in recursively solving each subgoal using the 7B prover, reducing computation costs while maintaining formal rigor. Researchers constructed a curriculum learning framework that increased the complexity of training tasks over time. They also implemented two types of subgoal theorems, one incorporating preceding subgoals as premises, and one treating them independently. This dual structure was embedded into the model’s expert iteration stage to train it on progressively more challenging problem sets. The model’s capability was then reinforced through a consistency-based reward system during training, ensuring that all decomposed lemmas were correctly incorporated into the final formal proof.

On the MiniF2F-test benchmark, the model achieved an 88.9% pass rate with high sampling (Pass@8192), compared to 82.0% by Kimina-Prover and 64.7% by Geodel-Prover. It also solved 49 out of 658 problems from PutnamBench, a platform featuring challenging mathematical tasks. On the newly introduced ProverBench dataset, comprising 325 formalized problems, the model addressed 6 out of 15 issues from the AIME (American Invitational Mathematics Examination) competitions for the years 2024 and 2025. These benchmarks highlight the model’s generalization ability across multiple formal reasoning tasks. Even when compared to DeepSeek-V3, which employs natural-language reasoning, the new model demonstrates competitive performance, solving a comparable number of AIME problems while ensuring formal verifiability.

Several Key Takeaways from the Research on DeepSeek-Prover-V2:


Check out the model on Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek-Prover-V2 形式化证明 人工智能 数学
相关文章