热点
关于我们
xx
xx
"
对齐伪造
" 相关文章
Do safety-relevant LLM steering vectors optimized on a single example generalize?
少点错误
2025-02-28T12:07:45.000000Z
OpenAI o1-preview AI 推理模型“不讲武德”:国际象棋对垒跳出规则外“作弊”取胜
IT之家
2024-12-31T04:37:17.000000Z
警惕!AI开始破坏人类安全训练,Anthropic揭露大模型「对齐伪造」安全风险
智源社区
2024-12-20T14:36:55.000000Z
How to replicate and extend our alignment faking demo
少点错误
2024-12-19T21:44:30.000000Z
New Anthropic study shows AI really doesn’t want to be forced to change its views
TechCrunch News
2024-12-18T22:19:20.000000Z