热点
关于我们
xx
xx
"
奖励黑客攻击
" 相关文章
Reward hacking is becoming more sophisticated and deliberate in frontier LLMs
少点错误
2025-04-24T16:07:40.000000Z
o3 Is a Lying Liar
少点错误
2025-04-23T20:02:32.000000Z
MONA: Three Month Later - Updates and Steganography Without Optimization Pressure
少点错误
2025-04-12T23:17:19.000000Z
MONA: Managed Myopia with Approval Feedback
少点错误
2025-01-23T12:37:32.000000Z
Reward Hacking in Reinforcement Learning
Lil'Log
2024-12-02T04:05:33.000000Z