热点
关于我们
xx
xx
"
信用分配
" 相关文章
GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
cs.AI updates on arXiv.org
2025-08-07T04:49:24.000000Z
CAPO: Towards Enhancing LLM Reasoning through Verifiable Generative Credit Assignment
cs.AI updates on arXiv.org
2025-08-05T11:10:02.000000Z
The challenge of hidden gifts in multi-agent reinforcement learning
cs.AI updates on arXiv.org
2025-05-28T04:03:41.000000Z
CALM: Credit Assignment with Language Models for Automated Reward Shaping in Reinforcement Learning
MarkTechPost@AI
2024-09-24T02:35:33.000000Z