热点
"模型风险" 相关文章
Jailbreak迎来“最后一卷”?港科大用“内容评分”重塑大模型越狱评估范式
PaperWeekly 2025-07-27T09:01:21.000000Z
黑化威胁操纵人类,Claude勒索,o1自主逃逸,人类「执剑人」紧急上线
36氪 - 科技频道 2025-07-01T04:11:10.000000Z
Contrived evaluations are useful evaluations
少点错误 2025-06-21T18:57:33.000000Z
Agentic Misalignment: How LLMs Could be Insider Threats
少点错误 2025-06-20T22:42:32.000000Z
如果竞争对手发布“高风险”AI OpenAI 可能会“调整”其安全措施
Cnbeta 2025-04-15T22:22:45.000000Z
38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
少点错误 2025-03-01T01:22:06.000000Z
[国 际] 合成数据能否让AI模型精确可靠?
中国科技报 2025-01-21T18:01:15.000000Z
Distinguish worst-case analysis from instrumental training-gaming
少点错误 2024-09-05T19:22:06.000000Z
Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?
少点错误 2024-07-28T12:36:27.000000Z