热点
关于我们
xx
xx
"
RLVR
" 相关文章
强化学习被高估!清华上交:RL不能提升推理能力,新知识得靠蒸馏
智源社区
2025-04-27T09:48:02.000000Z
强化学习真的会激励 LLM 中超出基本模型的推理能力吗?
智源社区
2025-04-23T02:42:52.000000Z
R1-Omni开源!全模态模型+RLVR,让各模态作用清晰可见
通义
2025-04-09T10:05:39.000000Z
Scalable Reinforcement Learning with Verifiable Rewards: Generative Reward Modeling for Unstructured, Multi-Domain Tasks
MarkTechPost@AI
2025-04-05T17:45:58.000000Z
Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR
MarkTechPost@AI
2025-03-30T02:11:12.000000Z
阿里开源R1-Omni,DeepSeek同款RLVR首度结合全模态情感识别,网友:可解释性+多模态学习=下一代AI
智源社区
2025-03-12T11:00:03.000000Z
R1-Omni开源!多模态模型+RLVR,让各模态作用清晰可见
魔搭ModelScope社区
2025-03-11T15:14:45.000000Z
R1-Omni开源!全模态模型+RLVR,让各模态作用清晰可见
通义
2025-03-11T12:10:26.000000Z
阿里通义团队开源 R1-Omni:多模态模型 + RLVR,让各模态作用清晰可见
IT之家
2025-03-11T11:25:47.000000Z
阿里开源R1-Omni,DeepSeek同款RLVR首度结合全模态情感识别,网友:可解释性+多模态学习=下一代AI
36kr-科技
2025-03-11T10:02:06.000000Z
阿里开源R1-Omni,DeepSeek同款RLVR首度结合全模态情感识别,网友:可解释性+多模态学习=下一代AI
量子位
2025-03-11T07:52:18.000000Z
The Many Faces of Reinforcement Learning: Shaping Large Language Models
Unite.AI
2025-02-13T14:47:03.000000Z
Allen AI’s Tülu 3 Just Became DeepSeek’s Unexpected Rival
Unite.AI
2025-02-01T17:40:20.000000Z
The Allen Institute for AI (AI2) Releases Tülu 3 405B: Scaling Open-Weight Post-Training with Reinforcement Learning from Verifiable Rewards (RLVR) to Surpass DeepSeek V3 and GPT-4o in Key Benchmarks
MarkTechPost@AI
2025-01-31T20:50:00.000000Z
Ai2 称其新型人工智能模型击败了 DeepSeek
Cnbeta
2025-01-31T06:04:15.000000Z