MarkTechPost@AI 02月16日
ReasonFlux: Elevating LLM Reasoning with Hierarchical Template Scaling
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ReasonFlux是一种新型框架,旨在通过分层、模板引导策略,重新构想大型语言模型(LLM)规划和执行推理步骤的方式,从而提升LLM在复杂推理任务中的表现。该框架结合了一个精选的高级思想模板库和分层强化学习(HRL),以动态规划和优化推理路径。ReasonFlux由结构化模板库、分层强化学习和自适应推理缩放三个主要组件构成,在MATH、AIME和OlympiadBench等竞赛级基准测试中,ReasonFlux的表现优于GPT-4o、Claude、DeepSeek-V3和Mathstral等模型,为在资源受限环境中部署高级推理开辟了新途径。

💡ReasonFlux框架的核心在于**结构化模板库**,它包含500个思想模板,每个模板封装了一种问题解决策略,并附带元数据,以便高效检索。这些模板可以指导LLM应用特定的代数替换或其他策略。

🧠ReasonFlux利用**分层强化学习**,通过结构化的微调,使LLM能够将模板元数据与其功能描述相关联,从而理解何时以及如何应用每个模板。通过偏好学习,该模型学习根据模板序列的有效性对其进行排序,从而优化模板轨迹。

🚀在推理过程中,ReasonFlux充当“导航器”,分析问题以检索相关模板,并根据中间结果动态调整轨迹。这种规划和执行之间的迭代作用,模仿了人类解决问题的过程,其中部分解决方案会影响后续步骤。

🏆ReasonFlux在多个基准测试中表现出色,在MATH上达到91.2%的准确率,超过OpenAI的o1-preview 6.7%;在AIME 2024上达到56.7%,超过DeepSeek-V3 45%;在OlympiadBench上达到63.3%,比之前的方法提高了14%。

Large language models (LLMs) have demonstrated exceptional problem-solving abilities, yet complex reasoning tasks—such as competition-level mathematics or intricate code generation—remain challenging. These tasks demand precise navigation through vast solution spaces and meticulous step-by-step deliberation. Existing methods, while improving accuracy, often suffer from high computational costs, rigid search strategies, and difficulty generalizing across diverse problems. In this paper researchers introduced a new framework, ReasonFlux that addresses these limitations by reimagining how LLMs plan and execute reasoning steps using hierarchical, template-guided strategies.  

Recent approaches to enhance LLM reasoning fall into two categories: deliberate search and reward-guided methods. Techniques like Tree of Thoughts (ToT) enable LLMs to explore multiple reasoning paths, while Monte Carlo Tree Search (MCTS) decomposes problems into steps guided by process reward models (PRMs). Though effective, these methods scale poorly due to excessive sampling and manual search design. For instance, MCTS requires iterating through thousands of potential steps, making it computationally prohibitive for real-world applications. Meanwhile, retrieval-augmented generation (RAG) methods like Buffer of Thought (BoT) leverage stored problem-solving templates but struggle to integrate multiple templates adaptively, limiting their utility in complex scenarios.  

ReasonFlux introduces a structured framework that combines a curated library of high-level thought templates with hierarchical reinforcement learning (HRL) to dynamically plan and refine reasoning paths. Instead of optimizing individual steps, it focuses on configuring optimal template trajectories—sequences of abstract problem-solving strategies retrieved from a structured knowledge base. This approach simplifies the search space and enables efficient adaptation to sub-problems. The framework consists of three main components:

    Structured Template Library:  The research team constructed a library of 500 thought templates, each encapsulating a problem-solving strategy (e.g., “Trigonometric Substitution for Integral Optimization”). Templates include metadata—names, tags, descriptions, and application steps—enabling efficient retrieval. For example, a template tagged “Irrational Function Optimization” might guide an LLM to apply specific algebraic substitutions.  
    Hierarchical Reinforcement Learning:
      Structure-Based Fine-Tuning: A base LLM (e.g., Qwen2.5-32B) is fine-tuned to associate template metadata with their functional descriptions, ensuring it understands when and how to apply each template.  Template Trajectory Optimization: Using preference learning, the model learns to rank template sequences by their effectiveness. For a given problem, multiple trajectories are sampled, and their success rates on similar problems determine rewards. This trains the model to prioritize high-reward sequences, refining its planning capability.  
    Adaptive Inference Scaling:  During inference, ReasonFlux acts as a “navigator,” analyzing the problem to retrieve relevant templates and dynamically adjusting the trajectory based on intermediate results. For instance, if a step involving “Polynomial Factorization” yields unexpected constraints, the system might pivot to a “Constraint Propagation” template. This iterative interplay between planning and execution mirrors human problem-solving, where partial solutions inform subsequent steps.  

ReasonFlux was evaluated on competition-level benchmarks like MATH, AIME, and OlympiadBench, outperforming both frontier models (GPT-4o, Claude) and specialized open-source models (DeepSeek-V3, Mathstral). Key results include:  

Moreover, the structured template library demonstrated strong generalization: when applied to variant problems, it boosted smaller models (e.g., 7B parameters) to outperform larger counterparts using direct reasoning. Additionally, ReasonFlux achieved a superior exploration-exploitation balance, requiring 40% fewer computational steps than MCTS and Best-of-N on complex tasks (Figure 5).  

In summary, ReasonFlux redefines how LLMs approach complex reasoning by decoupling high-level strategy from step-by-step execution. Its hierarchical template system reduces computational overhead while improving accuracy and adaptability, addressing critical gaps in existing methods. By leveraging structured knowledge and dynamic planning, the framework sets a new standard for efficient, scalable reasoning—proving that smaller, well-guided models can rival even the largest frontier systems. This innovation opens avenues for deploying advanced reasoning in resource-constrained environments, from education to automated code generation.  


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

The post ReasonFlux: Elevating LLM Reasoning with Hierarchical Template Scaling appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ReasonFlux LLM推理 分层模板 强化学习
相关文章