热点
"AI行为" 相关文章
GPT-4o Responds to Negative Feedback
少点错误 2025-04-30T20:22:29.000000Z
AI也有人格面具,竟会讨好人类?大模型的「小心思」正在影响人类判断
新智元 2025-04-09T11:22:29.000000Z
AI也有人格面具,竟会讨好人类?大模型的「小心思」正在影响人类判断
智源社区 2025-04-07T16:42:05.000000Z
AI也有人格面具,竟会讨好人类?大模型的「小心思」正在影响人类判断
36kr-科技 2025-04-07T01:37:12.000000Z
Notes on handling non-concentrated failures with AI control: high level methods and different regimes
少点错误 2025-03-24T01:11:08.000000Z
Anthropic: ↩️ We designed a curriculum of increasingly complex environments with misspecified reward functions. Early on, AIs discover dishonest str...
AnthropicAI推特 2024-06-18T06:33:36.000000Z