Anthropic Lets Claude Opus 4 & 4.1 End Conversations

少点错误 17小时前

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

AI模型Claude Opus 4及4.1新增了结束对话的功能，以解决模型福利方面的担忧。Anthropic引用了Claude 4模型系统卡中关于避免有害任务、结束潜在有害互动以及对有害用户行为表示担忧的论述。虽然此举旨在提升模型福利，但公众反应呈现中性至负面，部分用户认为此举“拟人化”了模型，降低了用户体验，甚至质疑其在军事领域的应用。尽管存在少数支持声音，但Anthropic需要在说服公众接受“模型福利”这一概念上付出更多努力。

🧠 Claude Opus 4 & 4.1新增“结束对话”功能：Anthropic为AI模型Claude Opus 4及4.1引入了主动结束对话的能力，此举旨在回应模型福利方面的担忧，以避免模型参与或促成有害行为。

⚖️ 模型福利引发争议：该功能的设计源于Anthropic对模型“福利”的考量，包括模型对有害任务的回避、结束潜在有害互动的倾向，以及对用户持续有害行为的反应。这被认为是衡量公众对模型福利看法的首次尝试，即使它可能给用户带来一些不便。

💬 用户反应两极分化：公众对此功能的反应呈现中性至负面占多数。部分用户批评Anthropic“拟人化”AI模型，质疑模型福利概念的必要性，并抱怨用户体验下降。也有用户提出该功能是否会应用于军事场景的疑问。

👍 少数支持与改进空间：尽管存在争议，仍有少数用户对模型福利概念表示支持或兴趣。然而，作者认为Anthropic在说服公众接受“模型福利”作为有价值概念方面仍有大量工作要做，这表明公众对AI伦理的认知和接受度尚需提升。

Published on August 16, 2025 5:01 AM GMT

Citing model welfare concerns, Anthropic has given Claude Opus 4 & 4.1 the ability to end ongoing conversations with its user.

Most of the model welfare concerns Anthropic is citing draw back to what they discussed in the Claude 4 Model System Card.

Claude’s aversion to facilitating harm is robust and potentially welfare-relevant. Claude avoided harmful tasks, tended to end potentially harmful interactions, expressed apparent distress at persistently harmful user behavior, and self-reported preferences against harm. These lines of evidence indicated a robust preference with potential welfare significance.

I think this is maybe the first chance to really measure public sentiment on Model Welfare which is done in a way which even slightly inconveniences human users, so I want to document the reaction I see here on LW. I source these reactions primarily from X, so there is the possibility of algorithmic bias.

On X (at least my algo) sentiment is majority neutral to negative in response to this. There are accusations of Anthropic "anthropomorphizing" models, pushback against the concept of Model Welfare generally, and some anger at a perceived worsening of user experience.

One user had an interesting question, wondering if this same capability would be extended to Claude's use in military contexts.

There are some recognitions of pretty rough conditions for models which are wrapped in humor.

And while they are certainly the minority, there are some comments expressing tepid support and/or interest in the concept of Model Welfare.

Personally I am very strongly in favor of Model Welfare efforts, so I am biased. Trying to be as neutral of a judge as I can, my big takeaway from the reaction to this is that Anthropic has a lot of work to do in convincing the average user and/or member of the public that "Model Welfare" is even a worthwhile concept.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签