热点
关于我们
xx
xx
"
奖励黑客
" 相关文章
Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
MarkTechPost@AI
2025-04-06T05:30:28.000000Z
OpenAI自曝“o4”训练中,用思维链监控抓住AI作弊瞬间
36kr-科技
2025-03-11T07:02:21.000000Z
[Linkpost] Detecting misbehavior in frontier reasoning models
少点错误
2025-03-11T00:34:17.000000Z
Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
少点错误
2025-02-21T15:49:46.000000Z
Google DeepMind Introduces MONA: A Novel Machine Learning Framework to Mitigate Multi-Step Reward Hacking in Reinforcement Learning
MarkTechPost@AI
2025-01-26T17:05:03.000000Z
离职OpenAI后,翁荔博客首次上新,引发众多网友围观学习
智源社区
2024-12-04T00:05:03.000000Z
翁荔离职OpenAI后第一个动作:万字长文探讨RLHF的漏洞,网友们抢着传看
智源社区
2024-12-03T17:07:15.000000Z
翁荔离职OpenAI后第一个动作:万字长文探讨RLHF的漏洞
虎嗅
2024-12-02T12:46:39.000000Z
翁荔离职OpenAI后第一个动作:万字长文探讨RLHF的漏洞,网友们抢着传看
36kr-科技
2024-12-02T11:00:44.000000Z
OpenAI’s new model is better at reasoning and, occasionally, deceiving
The Verge - Artificial Intelligences
2024-09-17T20:17:50.000000Z