热点
关于我们
xx
xx
"
过程奖励模型
" 相关文章
This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost Efficiency
MarkTechPost@AI
2025-05-29T02:45:52.000000Z
Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger Models
MarkTechPost@AI
2025-02-13T19:29:08.000000Z
R1风起,清华、港科大发布大模型强化推理技术最新全面综述
PaperAgent
2025-01-25T17:18:49.000000Z
通义千问团队开源全新的过程奖励模型PRM!
魔搭ModelScope社区
2025-01-20T16:07:49.000000Z
This AI Paper Explores Reinforced Learning and Process Reward Models: Advancing LLM Reasoning with Scalable Data and Test-Time Scaling
MarkTechPost@AI
2025-01-19T19:34:57.000000Z
基于开放模型的推理时计算缩放
Hugging Face
2024-12-31T11:00:27.000000Z
过程奖励模型PRM成版本答案!谷歌DeepMind全自动标注逐步骤奖励PAV,准确率提升8%
新智元
2024-11-16T14:16:08.000000Z