热点
"控制协议" 相关文章
Research Areas in Benchmark Design and Evaluation (The Alignment Project by UK AISI)
少点错误 2025-08-01T10:43:06.000000Z
Research Areas in AI Control (The Alignment Project by UK AISI)
少点错误 2025-08-01T10:43:04.000000Z
Untrusted AIs can exploit feedback in control protocols
少点错误 2025-05-27T16:47:31.000000Z
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
少点错误 2025-03-24T18:02:10.000000Z
A sketch of an AI control safety case
少点错误 2025-01-30T17:32:33.000000Z
A Brief Explanation of AI Control
少点错误 2024-10-22T07:02:26.000000Z