MarkTechPost@AI 2024年08月17日
DeepSeek-AI Open-Sources DeepSeek-Prover-V1.5: A Language Model with 7 Billion Parameters that Outperforms all Open-Source Models in Formal Theorem Proving in Lean 4
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeek-AI发布了DeepSeek-Prover-V1.5,这是一个拥有70亿参数的开源大型语言模型,它在Lean 4中的形式化定理证明方面超越了所有开源模型。DeepSeek-Prover-V1.5集成了证明步骤和完整证明生成技术,并采用了一种名为"截断-恢复"的机制,通过增强预训练、监督微调和强化学习,在多个基准测试中取得了显著成果,在Lean 4中的形式化定理证明领域取得了突破。

🤔 DeepSeek-Prover-V1.5 采用了一种名为"截断-恢复"的机制,将证明步骤和完整证明生成技术相结合。该机制首先使用语言模型生成完整的证明代码,然后使用Lean证明器验证代码。如果检测到错误,则在第一个错误消息处截断代码,并使用成功生成的代码部分作为下一个证明段的提示。这种机制通过集成到蒙特卡洛树搜索(MCTS)中,实现了灵活的截断点,并通过奖励自由探索算法解决了证明搜索中的奖励稀疏性问题,从而鼓励了对策略状态空间的广泛探索。

💪 DeepSeek-Prover-V1.5 在预训练阶段,对数学和代码数据进行了进一步的训练,重点关注Lean、Isabelle和Metamath等形式化语言。在监督微调阶段,通过两种数据增强技术改进了Lean 4代码补全数据集:1. 使用DeepSeek-Coder V2 236B添加自然语言思维链注释;2. 在Lean 4证明代码中插入中间策略状态信息。在强化学习阶段,采用GRPO算法从证明助手反馈(RLPAF)中进行强化学习,使用Lean证明器验证结果作为奖励。

🏆 DeepSeek-Prover-V1.5 在多个基准测试中展现出卓越的性能。在miniF2F-test数据集上,DeepSeek-Prover-V1.5-RL在单次完整证明生成中取得了60.2%的通过率,比其前代提高了10.2个百分点。在有限的128次采样预算下,它证明了51.6%的问题,超越了其他完整证明生成方法,并与领先的树搜索方法相匹配。当使用RMaxTS树搜索增强时,DeepSeek-Prover-V1.5-RL实现了最先进的62.7%的通过率。此外,它在明显更少的采样次数下超越了之前的最佳结果。在ProofNet数据集上,DeepSeek-Prover-V1.5-RL在单次通过和RMaxTS增强设置下分别实现了22.6%和25.3%的通过率,超越了现有方法。这些结果表明,DeepSeek-Prover-V1.5 在不同的定理证明任务和方法中表现出色。

🚀 DeepSeek-Prover-V1.5 建立了一个类似AlphaZero的管道,用于形式化定理证明,利用专家迭代和合成数据。目前,该框架的重点是探索,未来发展可能包括一个评论模型,用于评估不完整的证明,解决定理证明中强化学习的利用方面。

🤩 DeepSeek-Prover-V1.5 的开源发布将促进形式化定理证明领域的进一步发展,并为构建更强大的AI系统提供宝贵的工具。

Large language models (LLMs) have made significant strides in mathematical reasoning and theorem proving, yet they face considerable challenges in formal theorem proving using systems like Lean and Isabelle. These systems demand rigorous derivations that adhere to strict formal specifications, posing difficulties even for advanced models such as GPT-4. The core challenge lies in the model’s need to simultaneously comprehend the syntax and semantics of formal systems while aligning abstract mathematical reasoning with precise formal representations. This complex task requires a deep understanding of coding intricacies and mathematical concepts, creating a significant hurdle for current AI systems in producing complex formal proofs.

Researchers from DeepSeek-AI introduced DeepSeek-Prover-V1.5, a unified approach that combines the strengths of proof-step and whole-proof generation techniques through a robust truncate-and-resume mechanism. This method begins with whole-proof generation, where the language model produces complete proof code based on the theorem statement. The Lean prover then verifies this code. If an error is detected, the code is truncated at the first error message, and the successfully generated portion serves as a prompt for the next proof segment. The latest state from the Lean 4 prover is appended as a comment to the prompt to enhance accuracy. The truncate-and-resume mechanism is integrated into the Monte-Carlo tree search (MCTS), allowing for flexible truncation points determined by the tree search policy. Also, a reward-free exploration algorithm is proposed to address the reward sparsity issue in proof search, assigning intrinsic motivation to the tree search agent for extensive exploration of the tactic state space.

This study presents the following contributions:

• Pre-Training: Enhanced base model with further training on mathematics and code data, focusing on formal languages like Lean, Isabelle, and Metamath.

• Supervised Fine-Tuning: Improved Lean 4 code completion dataset through two data augmentation techniques:

  1. Used DeepSeek-Coder V2 236B to add natural language chain-of-thought comments.

  2. Inserted intermediate tactic state information within Lean 4 proof code.

• Reinforcement Learning: Employed GRPO algorithm for reinforcement learning from proof assistant feedback (RLPAF), using Lean prover verification results as rewards.

• Monte-Carlo Tree Search: Advanced tree search method with:

 1. Truncate-and-resume mechanism as state-action abstraction.

 2. RMaxTS algorithm, utilizing RMax strategy for exploration in sparse-reward proof search.

 3. Assigned intrinsic rewards to encourage diverse planning paths and extensive proof space exploration.

DeepSeek-Prover-V1.5 demonstrates significant advancements in formal theorem proving across multiple benchmarks. On the miniF2F-test dataset, DeepSeek-Prover-V1.5-RL achieved a 60.2% pass rate in a single-pass whole-proof generation, marking a 10.2 percentage point improvement over its predecessor. With a limited sampling budget of 128 attempts, it proved 51.6% of problems, outperforming other whole-proof generation methods and matching leading tree search methods. When enhanced with RMaxTS tree search, DeepSeek-Prover-V1.5-RL achieved a state-of-the-art 62.7% pass rate. Also, it surpassed the previous best result with significantly fewer samplings. On the ProofNet dataset, DeepSeek-Prover-V1.5-RL achieved pass rates of 22.6% and 25.3% in single-pass and RMaxTS-enhanced settings respectively, outperforming existing methods. These results demonstrate DeepSeek-Prover-V1.5’s superior performance across different theorem-proving tasks and methodologies.

DeepSeek-Prover-V1.5, a 7 billion parameter language model, sets new benchmarks in formal theorem proving using Lean 4. Built on DeepSeek-Prover-V1.5-Base, it undergoes specialized pre-training, comprehensive supervised fine-tuning, and reinforcement learning via GRPO. The model incorporates RMaxTS, an innovative Monte-Carlo tree search variant, to enhance problem-solving through extensive exploration. This framework establishes an AlphaZero-like pipeline for formal theorem proving, utilizing expert iteration and synthetic data. While the current focus is on exploration, future developments may include a critic model for assessing incomplete proofs, addressing the exploitation aspect of reinforcement learning in theorem proving.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

The post DeepSeek-AI Open-Sources DeepSeek-Prover-V1.5: A Language Model with 7 Billion Parameters that Outperforms all Open-Source Models in Formal Theorem Proving in Lean 4 appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek-Prover-V1.5 大型语言模型 形式化定理证明 Lean 4 开源
相关文章