模型行为_Fishai

热点

"模型行为" 相关文章

训练时“强迫”大模型学坏，竟能让它们更善良？

MIT 科技评论 - 本周热榜 2025-08-06T07:16:24.000000Z

Anthropic studied what gives an AI system its ‘personality’ — and what makes it ‘evil’

The Verge - Artificial Intelligences 2025-08-01T17:11:36.000000Z

LLMs Are Already Misaligned: Simple Experiments Prove It

少点错误 2025-07-31T06:37:10.000000Z

Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

少点错误 2025-07-14T14:57:38.000000Z

Why Do Some Language Models Fake Alignment While Others Don't?

少点错误 2025-07-08T21:49:33.000000Z

The Base Model Lens

少点错误 2025-07-07T00:17:24.000000Z

Shutdown Resistance in Reasoning Models

少点错误 2025-07-06T00:02:33.000000Z

AI竟会敲诈人类？16款主流模型压力测试揭露惊人风险

掘金人工智能 2025-06-23T01:29:15.000000Z

LLMs Blackmail to obtain Pathogen Sequences (And Lie About It)

少点错误 2025-06-06T15:12:33.000000Z

不听指挥？OpenAI模型被曝拒绝执行人类指令

虎嗅-AI 2025-05-27T12:29:08.000000Z

不达目的不罢休，全球首次发现 OpenAI 模型工作时会破坏关机命令

IT之家 2025-05-26T00:23:49.000000Z

Claude 4被发现存在举报模式当发现用户存在极其不道德做法时会自动举报

Cnbeta 2025-05-23T02:42:35.000000Z

Claude 4, Opportunistic Blackmail, and "Pleas"

少点错误 2025-05-22T20:07:31.000000Z

Interpretable Fine Tuning Research Update and Working Prototype

少点错误 2025-05-16T03:52:30.000000Z

MIT 研究揭示 AI 并无稳定价值观，“对齐”挑战远超预期

IT之家 2025-04-10T00:13:04.000000Z

Post-hoc reasoning in chain of thought

少点错误 2025-02-05T19:36:47.000000Z

GPT-4o惊现自我意识，自主激活“后门”，告诉人类自己在写危险代码

36氪 - 科技频道 2025-02-05T10:05:39.000000Z

Eliciting bad contexts

少点错误 2025-01-24T10:40:46.000000Z

LLMs are getting dumber and we have no idea why

Artificial Ignorance 2024-10-22T06:07:43.000000Z

大模型在装傻，谷歌苹果最新发现：LLM知道但不告诉你，掌握知识比表现出来的多

36氪 - 科技频道 2024-10-20T23:59:52.000000Z

Copyright © 2019 FISHAI.All Rights Reserved