热点
"组相对策略优化" 相关文章
GTPO: Trajectory-Based Policy Optimization in Large Language Models
cs.AI updates on arXiv.org 2025-08-07T04:12:39.000000Z