热点
"RLVR" 相关文章
High-Entropy Token Selection in Reinforcement Learning with Verifiable Rewards (RLVR) Improves Accuracy and Reduces Training Cost for LLMs
MarkTechPost@AI 2025-06-09T01:40:45.000000Z
Qwen&清华团队颠覆常识:大模型强化学习仅用20%关键token,比用全部token训练还好
智源社区 2025-06-06T15:33:05.000000Z
奖励是假的,能让Qwen提升25%性能却是真的!
智源社区 2025-05-30T07:58:19.000000Z
奖励是假的,能让Qwen提升25%性能却是真的
36kr-科技 2025-05-30T02:43:11.000000Z
奖励是假的,能让Qwen提升25%性能却是真的!
量子位 2025-05-29T11:43:12.000000Z
Incorrect Answers Improve Math Reasoning? Reinforcement Learning with Verifiable Rewards (RLVR) Surprises with Qwen2.5-Math
MarkTechPost@AI 2025-05-28T20:45:50.000000Z
Reinforcement learning with random rewards actually works with Qwen 2.5
Interconnects 2025-05-27T16:50:21.000000Z
DeepSeek-R1发布100天后:全面复盘推理大模型复现研究及未来!
PaperAgent 2025-05-08T07:22:57.000000Z
「推理革命」爆发100天:DeepSeek-R1复现研究全揭秘!
智源社区 2025-05-07T00:48:00.000000Z
「推理革命」爆发 100 天:DeepSeek-R1 复现研究全揭秘!
掘金 人工智能 2025-05-06T09:03:14.000000Z
强化学习被高估!清华上交:RL不能提升推理能力,新知识得靠蒸馏
智源社区 2025-04-27T09:48:02.000000Z
强化学习真的会激励 LLM 中超出基本模型的推理能力吗?
智源社区 2025-04-23T02:42:52.000000Z
R1-Omni开源!全模态模型+RLVR,让各模态作用清晰可见
通义 2025-04-09T10:05:39.000000Z
Scalable Reinforcement Learning with Verifiable Rewards: Generative Reward Modeling for Unstructured, Multi-Domain Tasks
MarkTechPost@AI 2025-04-05T17:45:58.000000Z
Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR
MarkTechPost@AI 2025-03-30T02:11:12.000000Z
阿里开源R1-Omni,DeepSeek同款RLVR首度结合全模态情感识别,网友:可解释性+多模态学习=下一代AI
智源社区 2025-03-12T11:00:03.000000Z
R1-Omni开源!多模态模型+RLVR,让各模态作用清晰可见
魔搭ModelScope社区 2025-03-11T15:14:45.000000Z
R1-Omni开源!全模态模型+RLVR,让各模态作用清晰可见
通义 2025-03-11T12:10:26.000000Z
阿里通义团队开源 R1-Omni:多模态模型 + RLVR,让各模态作用清晰可见
IT之家 2025-03-11T11:25:47.000000Z
阿里开源R1-Omni,DeepSeek同款RLVR首度结合全模态情感识别,网友:可解释性+多模态学习=下一代AI
36kr-科技 2025-03-11T10:02:06.000000Z