MarkTechPost@AI 21小时前
OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computation in LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OThink-R1 框架由浙江大学和OPPO的研究人员开发,旨在解决大型语言模型(LRMs)推理过程中计算冗余的问题。该框架模拟人类思维,实现快慢推理模式的智能切换,从而提高效率。OThink-R1通过分析推理模式,识别关键步骤和冗余步骤,并利用特别的损失函数和微调数据集,促使模型根据任务的复杂程度调整推理风格。实验表明,OThink-R1在保持精度的前提下,将不必要的推理减少了23%以上,为构建更具适应性、可扩展性和高效性的AI推理系统提供了新思路。

🧠 现有LRMs在解决简单任务时,过度使用复杂的推理,导致计算成本增加。OThink-R1旨在解决这个问题,通过模拟人类的思维方式,实现快慢推理的智能切换。

💡 OThink-R1的核心在于其动态的快慢推理切换能力。该框架通过分析推理模式,区分关键步骤和冗余步骤,并使用一个“评判员”模型来帮助训练LRMs根据任务复杂性调整推理风格。

🔬 该框架采用双参考KL散度损失,鼓励模型在快慢两种推理模式之间灵活切换。在数学和问答任务上的测试结果表明,OThink-R1在减少推理冗余的同时,保持甚至提高了准确性,与现有模型相比,在效率和性能之间取得了更好的平衡。

⚙️ OThink-R1通过修剪冗余推理步骤,并保留关键逻辑,从而减少不必要的计算。实验结果显示,OThink-R1在各种数据集上表现出色,生成的token更少,同时保持或提高了准确性,表明了其在自适应推理方面的优势。

The Inefficiency of Static Chain-of-Thought Reasoning in LRMs

Recent LRMs achieve top performance by using detailed CoT reasoning to solve complex tasks. However, many simple tasks they handle could be solved by smaller models with fewer tokens, making such elaborate reasoning unnecessary. This echoes human thinking, where we use fast, intuitive responses for easy problems and slower, analytical thinking for complex ones. While LRMs mimic slow, logical reasoning, they generate significantly longer outputs, thereby increasing computational cost. Current methods for reducing reasoning steps lack flexibility, limiting models to a single fixed reasoning style. There is a growing need for adaptive reasoning that adjusts effort according to task difficulty. 

Limitations of Existing Training-Based and Training-Free Approaches

Recent research on improving reasoning efficiency in LRMs can be categorized into two main areas: training-based and training-free methods. Training strategies often use reinforcement learning or fine-tuning to limit token usage or adjust reasoning depth, but they tend to follow fixed patterns without flexibility. Training-free approaches utilize prompt engineering or pattern detection to shorten outputs during inference; however, they also lack adaptability. More recent work focuses on variable-length reasoning, where models adjust reasoning depth based on task complexity. Others study “overthinking,” where models over-reason unnecessarily. However, few methods enable dynamic switching between quick and thorough reasoning—something this paper addresses directly. 

Introducing OThink-R1: Dynamic Fast/Slow Reasoning Framework

Researchers from Zhejiang University and OPPO have developed OThink-R1, a new approach that enables LRMs to switch between fast and slow thinking smartly, much like humans do. By analyzing reasoning patterns, they identified which steps are essential and which are redundant. With help from another model acting as a judge, they trained LRMs to adapt their reasoning style based on task complexity. Their method reduces unnecessary reasoning by over 23% without losing accuracy. Using a loss function and fine-tuned datasets, OThink-R1 outperforms previous models in both efficiency and performance on various math and question-answering tasks. 

System Architecture: Reasoning Pruning and Dual-Reference Optimization

The OThink-R1 framework helps LRMs dynamically switch between fast and slow thinking. First, it identifies when LRMs include unnecessary reasoning, like overexplaining or double-checking, versus when detailed steps are truly essential. Using this, it builds a curated training dataset by pruning redundant reasoning and retaining valuable logic. Then, during fine-tuning, a special loss function balances both reasoning styles. This dual-reference loss compares the model’s outputs with both fast and slow thinking variants, encouraging flexibility. As a result, OThink-R1 can adaptively choose the most efficient reasoning path for each problem while preserving accuracy and logical depth. 

Empirical Evaluation and Comparative Performance

The OThink-R1 model was tested on simpler QA and math tasks to evaluate its ability to switch between fast and slow reasoning. Using datasets like OpenBookQA, CommonsenseQA, ASDIV, and GSM8K, the model demonstrated strong performance, generating fewer tokens while maintaining or improving accuracy. Compared to baselines such as NoThinking and DualFormer, OThink-R1 demonstrated a better balance between efficiency and effectiveness. Ablation studies confirmed the importance of pruning, KL constraints, and LLM-Judge in achieving optimal results. A case study illustrated that unnecessary reasoning can lead to overthinking and reduced accuracy, highlighting OThink-R1’s strength in adaptive reasoning. 

Conclusion: Towards Scalable and Efficient Hybrid Reasoning Systems

In conclusion, OThink-R1 is a large reasoning model that adaptively switches between fast and slow thinking modes to improve both efficiency and performance. It addresses the issue of unnecessarily complex reasoning in large models by analyzing and classifying reasoning steps as either essential or redundant. By pruning the redundant ones while maintaining logical accuracy, OThink-R1 reduces unnecessary computation. It also introduces a dual-reference KL-divergence loss to strengthen hybrid reasoning. Tested on math and QA tasks, it cuts down reasoning redundancy by 23% without sacrificing accuracy, showing promise for building more adaptive, scalable, and efficient AI reasoning systems in the future. 


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post OThink-R1: A Dual-Mode Reasoning Framework to Cut Redundant Computation in LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OThink-R1 大语言模型 推理效率 人工智能
相关文章