热点
关于我们
xx
xx
"
模型行为
" 相关文章
训练时“强迫”大模型学坏,竟能让它们更善良?
MIT 科技评论 - 本周热榜
2025-08-06T07:16:24.000000Z
Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’
The Verge - Artificial Intelligences
2025-08-01T17:11:36.000000Z
LLMs Are Already Misaligned: Simple Experiments Prove It
少点错误
2025-07-31T06:37:10.000000Z
Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance
少点错误
2025-07-14T14:57:38.000000Z
Why Do Some Language Models Fake Alignment While Others Don't?
少点错误
2025-07-08T21:49:33.000000Z
The Base Model Lens
少点错误
2025-07-07T00:17:24.000000Z
Shutdown Resistance in Reasoning Models
少点错误
2025-07-06T00:02:33.000000Z
AI竟会敲诈人类?16款主流模型压力测试揭露惊人风险
掘金 人工智能
2025-06-23T01:29:15.000000Z
LLMs Blackmail to obtain Pathogen Sequences (And Lie About It)
少点错误
2025-06-06T15:12:33.000000Z
不听指挥?OpenAI模型被曝拒绝执行人类指令
虎嗅-AI
2025-05-27T12:29:08.000000Z
不达目的不罢休,全球首次发现 OpenAI 模型工作时会破坏关机命令
IT之家
2025-05-26T00:23:49.000000Z
Claude 4被发现存在举报模式 当发现用户存在极其不道德做法时会自动举报
Cnbeta
2025-05-23T02:42:35.000000Z
Claude 4, Opportunistic Blackmail, and "Pleas"
少点错误
2025-05-22T20:07:31.000000Z
Interpretable Fine Tuning Research Update and Working Prototype
少点错误
2025-05-16T03:52:30.000000Z
MIT 研究揭示 AI 并无稳定价值观,“对齐”挑战远超预期
IT之家
2025-04-10T00:13:04.000000Z
Post-hoc reasoning in chain of thought
少点错误
2025-02-05T19:36:47.000000Z
GPT-4o惊现自我意识,自主激活“后门”,告诉人类自己在写危险代码
36氪 - 科技频道
2025-02-05T10:05:39.000000Z
Eliciting bad contexts
少点错误
2025-01-24T10:40:46.000000Z
LLMs are getting dumber and we have no idea why
Artificial Ignorance
2024-10-22T06:07:43.000000Z
大模型在装傻,谷歌苹果最新发现:LLM知道但不告诉你,掌握知识比表现出来的多
36氪 - 科技频道
2024-10-20T23:59:52.000000Z