对齐伪造_Fishai

热点

"对齐伪造" 相关文章

Do safety-relevant LLM steering vectors optimized on a single example generalize?

少点错误 2025-02-28T12:07:45.000000Z

OpenAI o1-preview AI 推理模型“不讲武德”：国际象棋对垒跳出规则外“作弊”取胜

IT之家 2024-12-31T04:37:17.000000Z

警惕！AI开始破坏人类安全训练，Anthropic揭露大模型「对齐伪造」安全风险

智源社区 2024-12-20T14:36:55.000000Z

How to replicate and extend our alignment faking demo

少点错误 2024-12-19T21:44:30.000000Z

New Anthropic study shows AI really doesn’t want to be forced to change its views

TechCrunch News 2024-12-18T22:19:20.000000Z

Copyright © 2019 FISHAI.All Rights Reserved