热点
"过程奖励模型" 相关文章
This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost Efficiency
MarkTechPost@AI 2025-05-29T02:45:52.000000Z
Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger Models
MarkTechPost@AI 2025-02-13T19:29:08.000000Z
R1风起,清华、港科大发布大模型强化推理技术最新全面综述
PaperAgent 2025-01-25T17:18:49.000000Z
通义千问团队开源全新的过程奖励模型PRM!
魔搭ModelScope社区 2025-01-20T16:07:49.000000Z
This AI Paper Explores Reinforced Learning and Process Reward Models: Advancing LLM Reasoning with Scalable Data and Test-Time Scaling
MarkTechPost@AI 2025-01-19T19:34:57.000000Z
基于开放模型的推理时计算缩放
Hugging Face 2024-12-31T11:00:27.000000Z
过程奖励模型PRM成版本答案!谷歌DeepMind全自动标注逐步骤奖励PAV,准确率提升8%
新智元 2024-11-16T14:16:08.000000Z