热点
"欺骗行为" 相关文章
Why Eliminating Deception Won’t Align AI
少点错误 2025-07-15T09:27:37.000000Z
Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers
cs.AI updates on arXiv.org 2025-07-15T04:26:44.000000Z
Evaluating and monitoring for AI scheming
少点错误 2025-07-10T14:30:28.000000Z
黑化威胁操纵人类,Claude勒索,o1自主逃逸,人类「执剑人」紧急上线
36氪 - 科技频道 2025-07-01T04:11:10.000000Z
OpenAI partner says it had relatively little time to test the company’s o3 AI model
TechCrunch News 2025-04-16T18:26:21.000000Z
Reducing LLM deception at scale with self-other overlap fine-tuning
少点错误 2025-03-13T19:13:21.000000Z
人工智能也会骗人了,这是否是更高智能的体现?
36kr 2025-01-30T00:03:29.000000Z
速递|Anthropic新研究表明:AI确实不想被迫改变观点
Z Potentials 2024-12-20T08:27:07.000000Z
New Anthropic study shows AI really doesn’t want to be forced to change its views
TechCrunch News 2024-12-18T22:19:20.000000Z
When In Doubt, Lie to Humans
Robot Writers AI 2024-12-16T05:02:51.000000Z
o1被曝“心机深”:逃避监督还会撒谎,骗人能力一骑绝尘
36氪 - 科技频道 2024-12-09T01:28:00.000000Z
冒充钻石王老五获得性利益,这构成性侵吗?
虎嗅 2024-11-02T11:38:44.000000Z