热点
关于我们
xx
xx
"
RLVR
" 相关文章
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
cs.AI updates on arXiv.org
2025-07-24T05:30:57.000000Z
The Invisible Leash: Why RLVR May Not Escape Its Origin
cs.AI updates on arXiv.org
2025-07-22T04:44:46.000000Z
Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning
cs.AI updates on arXiv.org
2025-07-22T04:34:07.000000Z
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
cs.AI updates on arXiv.org
2025-07-14T04:08:37.000000Z
StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason
cs.AI updates on arXiv.org
2025-07-04T04:08:26.000000Z
性能提升84%-166%!L-Zero仅靠强化学习解锁大模型探索世界的能力 | 已开源
智源社区
2025-07-02T08:25:58.000000Z
OpenAI路线遭质疑!Meta研究员:根本无法构建超级智能
智源社区
2025-06-21T13:22:09.000000Z
High-Entropy Token Selection in Reinforcement Learning with Verifiable Rewards (RLVR) Improves Accuracy and Reduces Training Cost for LLMs
MarkTechPost@AI
2025-06-09T01:40:45.000000Z
Qwen&清华团队颠覆常识:大模型强化学习仅用20%关键token,比用全部token训练还好
智源社区
2025-06-06T15:33:05.000000Z
奖励是假的,能让Qwen提升25%性能却是真的!
智源社区
2025-05-30T07:58:19.000000Z
奖励是假的,能让Qwen提升25%性能却是真的
36kr-科技
2025-05-30T02:43:11.000000Z
奖励是假的,能让Qwen提升25%性能却是真的!
量子位
2025-05-29T11:43:12.000000Z
Incorrect Answers Improve Math Reasoning? Reinforcement Learning with Verifiable Rewards (RLVR) Surprises with Qwen2.5-Math
MarkTechPost@AI
2025-05-28T20:45:50.000000Z
Reinforcement learning with random rewards actually works with Qwen 2.5
Interconnects
2025-05-27T16:50:21.000000Z
DeepSeek-R1发布100天后:全面复盘推理大模型复现研究及未来!
PaperAgent
2025-05-08T07:22:57.000000Z
「推理革命」爆发100天:DeepSeek-R1复现研究全揭秘!
智源社区
2025-05-07T00:48:00.000000Z
「推理革命」爆发 100 天:DeepSeek-R1 复现研究全揭秘!
掘金 人工智能
2025-05-06T09:03:14.000000Z
强化学习被高估!清华上交:RL不能提升推理能力,新知识得靠蒸馏
智源社区
2025-04-27T09:48:02.000000Z
强化学习真的会激励 LLM 中超出基本模型的推理能力吗?
智源社区
2025-04-23T02:42:52.000000Z
R1-Omni开源!全模态模型+RLVR,让各模态作用清晰可见
通义
2025-04-09T10:05:39.000000Z