MarkTechPost@AI 07月25日 12:10
DualDistill and Agentic-R1: How AI Combines Natural Language and Tool Use for Superior Math Problem Solving
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近日,卡内基梅隆大学的研究人员提出了DualDistill框架,旨在解决现有长CoT推理模型在处理数学问题时的计算成本高和易出错问题。该框架创新性地结合了两种互补的教师模型:一种侧重于自然语言推理,另一种则利用工具进行辅助。通过这种方式训练出的Agentic-R1模型,能够根据问题类型动态选择最适合的策略——在处理算术和算法任务时调用代码解释器,而在面对抽象概念时则运用自然语言推理。DualDistill框架通过轨迹组合与自蒸馏技术,有效地将两种教师模型的知识融汇于单一的学生模型中。评估结果显示,Agentic-R1在多个数学推理基准测试中均表现出色,显著优于仅依赖单一策略的模型,并且在复杂问题中能智能地选择工具使用,实现了效率与精度的良好平衡,即使在教师模型不完美的情况下,其性能依然能够得到提升。

💡 **DualDistill框架融合两种优势策略:** 该框架通过结合“推理导向”教师和“工具增强”教师的知识,训练出Agentic-R1模型,使其能够根据问题的性质,灵活切换使用自然语言推理或代码解释器来解决数学问题,从而克服了单一模型在处理不同类型问题时的局限性。

🚀 **Agentic-R1实现高效智能的工具使用:** Agentic-R1在处理计算密集型问题时,能智能地激活代码执行工具,而在简单问题上则减少工具调用,展现出优化的工具使用模式。这种能力通过监督微调即可习得,无需显式指令,有效平衡了计算效率和推理准确性。

🏆 **在数学推理基准测试中表现卓越:** Agentic-R1在DeepMath-L和Combinatorics300等多个数学推理基准测试中,均显著优于同等规模的纯语言模型和纯工具模型。它不仅在需要工具辅助的任务上表现更佳,而且在纯推理任务上也能保持较高的效率。

💪 **对不完美教师的鲁棒性:** 即使是在教师模型表现不佳的情况下,DualDistill框架依然能有效提升学生模型的性能。例如,在教师模型准确率仅为48.4%的情况下,学生模型Agentic-R1仍能从44.7%提升至50.9%,证明了该框架的稳健性和知识迁移能力。

Existing long-CoT reasoning models have achieved state-of-the-art performance in mathematical reasoning by generating reasoning trajectories with iterative self-verification and refinement. However, open-source long-CoT models depend only on natural language reasoning traces, making them computationally expensive and prone to errors without verification mechanisms. Although tool-aided reasoning provides greater efficiency and reliability for large-scale numerical computations through frameworks like OpenHands that integrate code interpreters, these agentic approaches struggle with abstract or conceptually complex reasoning problems.

DualDistill Framework and Agentic-R1 Model

Researchers from Carnegie Mellon University have proposed DualDistill, a distillation framework that combines trajectories from two complementary teachers to create a unified student model. The framework utilizes one reasoning-oriented teacher and one tool-augmented teacher to develop Agentic-R1, a model that learns to select the most appropriate strategy for each problem type dynamically. Agentic-R1 executes code for arithmetic and algorithmic tasks while employing natural language reasoning for abstract problems. DualDistill utilizes trajectory composition to distill knowledge from both complementary teachers, followed by self-distillation. Moreover, researchers used OpenHands as the agentic reasoning teacher, and DeepSeek-R1 as the text-based reasoning teacher.

https://arxiv.org/abs/2507.05707

Evaluation and Benchmarks

The proposed method is evaluated across multiple benchmarks like DeepMath-L and Combinatorics300 to test various aspects of mathematical reasoning. It is compared against the baselines DeepSeek-R1-Distill and Qwen-2.5-Instruct. The student model, Agentic-R1, shows great performance improvements that benefit from both agentic and reasoning strategies. It outperforms two similarly sized models, each specializing in tool-assisted (Qwen2.5-7B-Instruct) or pure reasoning (Deepseek-R1-Distill7B) strategies. Agentic-R1 outperforms tool-based models by intelligently using reasoning strategies when required, while maintaining greater efficiency compared to pure reasoning models on standard mathematical tasks.

Qualitative Analysis and Tool Usage Patterns

Qualitative examples show that Agentic-R1 exhibits intelligent tool usage patterns, activating code execution tools in 79.2% of computationally demanding Combinatorics300 problems, while reducing activation to 52.0% for the simpler AMC dataset problems. Agentic-R1 learns to invoke tools appropriately through supervised fine-tuning alone, without explicit instruction, effectively balancing computational efficiency and reasoning accuracy.

Robustness to Imperfect Teachers

The framework remains effective even when guided by imperfect teachers. For instance, the agentic teacher achieves only 48.4% accuracy on Combinatorics300, yet the student model improved from 44.7% to 50.9%, ultimately outperforming the teacher.

Conclusion

In summary, the DualDistill framework effectively combines the strengths of natural language reasoning and tool-assisted problem solving by distilling complementary knowledge from two specialized teacher models into a single versatile student model, Agentic-R1. Through trajectory composition and self-distillation, Agentic-R1 learns to dynamically select the most appropriate strategy for each problem, balancing precision and computational efficiency. Evaluations across diverse mathematical reasoning benchmarks demonstrate that Agentic-R1 outperforms both pure reasoning and tool-based models, even when learning from imperfect teachers. This work highlights a promising approach to building adaptable AI agents capable of integrating heterogeneous problem-solving strategies for more robust and efficient reasoning.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project.

Meet the AI Dev Newsletter read by 40k+ Devs and Researchers from NVIDIA, OpenAI, DeepMind, Meta, Microsoft, JP Morgan Chase, Amgen, Aflac, Wells Fargo and 100s more [SUBSCRIBE NOW]

The post DualDistill and Agentic-R1: How AI Combines Natural Language and Tool Use for Superior Math Problem Solving appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DualDistill Agentic-R1 数学推理 AI模型 工具使用
相关文章