MarkTechPost@AI 01月24日
O1-Pruner: Streamlining Long-Thought Reasoning in Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大型语言模型在推理任务中展现出强大能力,但长推理过程导致计算时间和能源消耗增加。O1-Pruner技术通过强化学习优化模型推理路径,在不牺牲准确性的前提下,缩短推理长度,提升效率。该技术通过预采样评估基准性能,并使用自定义的强化学习损失函数来微调模型的推理长度,使其与问题复杂度相匹配。实验证明,O1-Pruner在数学推理基准测试中,显著减少了推理时间和计算成本,同时保持甚至提高了准确性,为未来优化推理模型奠定了基础。

⏱️ O1-Pruner的核心在于长度协调微调方法,平衡推理长度和准确性。它通过参考模型采样评估推理质量和长度,为每个问题生成多个解决方案,创建性能基准。

🎯 O1-Pruner设计了包含长度奖励和准确性奖励的奖励函数。长度奖励鼓励相对于参考模型更短的解决方案,而准确性奖励确保较短的推理路径不影响正确性。

🚀 O1-Pruner使用近端策略优化(PPO)进行强化学习训练,并通过离策略训练简化工作流程,降低训练复杂度。该技术能够根据问题复杂度动态调整推理深度,适用于各种任务。

📊 基于数学推理基准测试,例如MATH、GSM8K和GaoKao,使用O1-Pruner微调的模型在显著缩短推理路径的同时,保持甚至提高了准确性。例如,Marco-o1-7B模型的推理长度减少了40.5%,准确率提升至76.8%。

⚡ 在推理时间方面,O1-Pruner也带来了显著提升。在MATH数据集上,Marco-o1-7B的推理时间从2分钟缩短至1分钟多一点,QwQ-32B-Preview的推理时间从6分钟减少到大约4分钟。

Large language models (LLMs) have introduced impressive capabilities, particularly in reasoning tasks. Models like OpenAI’s O1 utilize “long-thought reasoning,” where complex problems are broken into manageable steps and solutions are refined iteratively. While this approach enhances problem-solving, it comes at a cost: extended output sequences lead to increased computational time and energy use. These inefficiencies raise concerns about scalability and the practical usability of such models in real-world applications. Addressing this issue is essential for making LLMs more efficient and broadly applicable.

Researchers from Sun Yat-sen University, China Agriculture University, Tsinghua University, the University of Oxford, Didichuxing, and NTU propose Length-Harmonizing Fine-Tuning (O1-Pruner). This technique seeks to reduce the inefficiencies in reasoning models while maintaining accuracy. The primary focus is on optimizing token usage, which is a significant bottleneck in current models. O1-Pruner uses reinforcement learning (RL) techniques to encourage the generation of shorter reasoning paths without sacrificing precision.

The process begins with evaluating baseline performance through pre-sampling. A customized RL-style loss function then fine-tunes the model’s reasoning length, ensuring that the generated solutions are proportional to the complexity of the problem. By aligning reasoning length with task difficulty, O1-Pruner reduces computational costs without compromising on quality.

Technical Details and Benefits of O1-Pruner

At the heart of O1-Pruner is the Length-Harmonizing Fine-Tuning approach, which balances reasoning length and accuracy. The key steps include:

    Reference Model Sampling: A reference model evaluates reasoning quality and length by generating multiple solutions for each problem, creating a performance benchmark.Reward Function Design: This involves two components:
      Length Reward: Shorter solutions relative to the reference model are encouraged.Accuracy Reward: Ensures that shorter reasoning paths do not compromise correctness.
    Reinforcement Learning Framework: Proximal Policy Optimization (PPO) is used to train the model efficiently. Off-policy training further simplifies the workflow and reduces training complexity.

The benefits of O1-Pruner include:

Results and Insights

Experiments on mathematical reasoning benchmarks such as MATH, GSM8K, and GaoKao showcase O1-Pruner’s effectiveness. For example:

Inference time also improved significantly. On the MATH dataset:

These results highlight O1-Pruner’s ability to balance accuracy and efficiency. Its superior performance, as measured by the Accuracy-Efficiency Score (AES), establishes it as a better alternative to other methods like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Conclusion

O1-Pruner demonstrates that efficient reasoning in LLMs is achievable without compromising accuracy. By harmonizing reasoning length with problem complexity, it addresses the computational inefficiencies inherent in long-thought reasoning. This work lays the groundwork for further advancements in optimizing reasoning models, enabling their application in diverse, real-world scenarios where efficiency and accuracy are equally critical.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

[Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

The post O1-Pruner: Streamlining Long-Thought Reasoning in Language Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 O1-Pruner 强化学习 推理优化 计算效率
相关文章