热点
"高熵token" 相关文章
High-Entropy Token Selection in Reinforcement Learning with Verifiable Rewards (RLVR) Improves Accuracy and Reduces Training Cost for LLMs
MarkTechPost@AI 2025-06-09T01:40:45.000000Z
Qwen&清华团队颠覆常识:大模型强化学习仅用20%关键token,比用全部token训练还好
智源社区 2025-06-06T15:33:05.000000Z