热点
"Open-Reasoner-Zero" 相关文章
Open-Reasoner-Zero: An Open-source Implementation of Large-Scale Reasoning-Oriented Reinforcement Learning Training
MarkTechPost@AI 2025-02-25T06:42:55.000000Z
出人意料!DeepSeek-R1用的GRPO其实非最优?规模化强化学习训练用PPO就够了
机器之心 2025-02-21T05:49:07.000000Z