MarkTechPost@AI 02月01日
Meet RAGEN Framework: The First Open-Source Reproduction of DeepSeek-R1 for Training Agentic Models via Reinforcement Learning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeekAI在大型语言模型和强化学习领域取得了显著进展,致力于开发能够独立进行决策的AI智能体,尤其是在多步骤任务中。他们的新框架RAGEN,通过模仿DeepSeek-R1的训练方法,解决了传统AI训练中决策不一致、奖励不稳定和规划受限的问题。RAGEN采用两阶段训练方法:在rollout阶段,环境状态和模型生成的推理令牌一起处理;在update阶段,只有关键令牌(行动和奖励)用于学习,从而确保稳定的批量rollout并提升决策能力。通过在Sokoban谜题环境中的测试,RAGEN展示了其在提升AI智能体训练方面的有效性,特别是在物流自动化和AI助手等应用中具有广阔前景。

🚀DeepSeekAI的RAGEN框架是DeepSeek-R1方法的首次开源复现,专注于解决多步骤推理和现实任务中AI智能体训练的挑战。

🧠RAGEN通过两阶段训练方法优化了智能体训练:rollout阶段同时处理环境状态和推理令牌,而update阶段仅利用关键令牌(行动和奖励)进行学习,确保了训练的稳定性。

🧩在Sokoban谜题环境中的测试表明,RAGEN框架下,较小的模型性能可与大型模型媲美,并且无需明确指令的模型也能很好地适应环境,展现了其高效性和泛化能力。

🎯RAGEN框架通过解决决策不一致、奖励不稳定和规划受限等问题,显著提升了AI智能体的训练效果,为物流自动化和AI助手等领域提供了新的解决方案。

Developing AI agents capable of independent decision-making, especially for multi-step tasks, is a significant challenge. DeepSeekAI, a leader in advancing large language models and reinforcement learning, focuses on enabling AI to process information, predict outcomes, and adjust actions as situations evolve. It underlines the importance of proper reasoning in dynamic settings. The new development from DeepSeekAI captures state-of-the-art methods in reinforcement learning, large language models, and agent-based decision-making to ensure that it stays on top of the current AI research and applications. It deals with many common problems, such as decision-making inconsistencies, long-term planning issues, and the inability to adapt to changing conditions. However, AI can take suboptimal actions or even commit errors without a proper reasoning mechanism.

Many AI training methodologies suffer from problems of inconsistent processing, which, in turn, leads to errors on tasks that necessitate multiple decision-making rounds. These approaches do not describe an environment that, through the action of AI, provides a complete understanding of the consequences, due to which results are unanalyzed and obscure. Also, training is implemented in a step-by-step procedure by which there are breaks in learning sequences, and reward functions become unstable, resulting in the lack of suitable long-term policy development. Therefore, decision and problem-solving systems become inefficient and ineffective. The DeepSeekAI solves this dilemma by providing more integrated and well-streamlined training, helping AI make good, consistent, dependable decisions while quickly adapting to new environments.

Meet RAGEN, the first reproduction of DeepSeek-R1(-Zero) methods for training agentic models, to address challenges in training AI agents for multi-step reasoning and real-world tasks. DeepSeekAI, known for its advancements in large language models and reinforcement learning, developed DeepSeek-R1 to enhance agentic reasoning through structured training. Unlike other methods that struggle with inconsistent batch processing, limited planning, and unstable rewards, RAGEN streamlines training using a two-phase approach: a rollout phase where environment states and model-generated reasoning tokens are processed together and an update phase where only critical tokens (actions and rewards) contribute to learning, ensuring stable batch rollouts and improving decision-making. The framework efficiently prevents instability from variable sequence lengths by generating reasoning and action tokens during rollout, executing only actions in the environment, and reinforcing strategic planning through reward aggregation in the update phase. Tested on the Sokoban puzzle environment, RAGEN showed that smaller models perform comparably to larger ones and that models without explicit instructions adapt well. RAGEN enhances sequential decision-making by reproducing DeepSeek-R1’s training methodology, making it valuable for applications like logistics automation and AI assistants.

Ultimately, RAGEN enhances the training of AI agents by eliminating inconsistent decision-making, unstable rewards, and planning limitations. By mimicking DeepSeek-R1’s approach, it guarantees stable learning and better adaptability. Tested on the Sokoban puzzle, it showed that smaller models perform well as an efficiency indicator. As a baseline for future research, RAGEN can help refine AI training methods, improve reinforcement learning, and support advancements in general-purpose AI systems.


Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)

The post Meet RAGEN Framework: The First Open-Source Reproduction of DeepSeek-R1 for Training Agentic Models via Reinforcement Learning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RAGEN框架 DeepSeekAI 强化学习 AI智能体 多步骤推理
相关文章