热点
"轨迹采样" 相关文章
Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms
cs.AI updates on arXiv.org 2025-08-08T04:36:23.000000Z
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
cs.AI updates on arXiv.org 2025-08-08T04:17:24.000000Z