热点
"VFT" 相关文章
Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
cs.AI updates on arXiv.org 2025-07-01T04:13:55.000000Z