Agentic-R1: Distilled Dual-Strategy Reasoning

cs.AI updates on arXiv.org 07月09日 12:01

Agentic-R1: Distilled Dual-Strategy Reasoning

本文介绍了一种名为DualDistill的模型微调框架，通过从多个教师模型中提取互补推理策略，训练出动态选择最佳策略的Agentic-R1模型，有效提升数学推理和逻辑任务的处理效率。

arXiv:2507.05707v1 Announce Type: cross Abstract: Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning. Our project is available at https://github.com/StigLidu/DualDistill

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多策略推理模型微调推理效率

相关文章

谷歌推出开源模型Google Gemma 2 可以在普通电脑上高速推理和运行

The future of productivity agents with NinjaTech AI and AWS Trainium

拆分Transformer注意力，韩国团队让大模型解码提速20倍

基于Sentence Transformer微调向量模型

院士领衔推出大模型的第3种记忆：比参数存储和RAG都便宜，2.4B模型越级打13B

Fireworks AI完成5,200萬美元的B輪增資，身價達5.52億美元

只激活3.8B参数，性能比肩同款7B模型！训练微调都能用，来自微软

Q-Sparse: A New Artificial Intelligence AI Approach to Enable Full Sparsity of Activations in LLMs

微软现支持开发者微调 Phi-3-mini 和 Phi-3-medium AI 模型

LLM对齐数据全自动合成！UW华人博士生提出Magpie方法，Macbook Air即可运行