热点
"拒绝行为" 相关文章
打开AI黑箱:如何用归因图绘制大语言模型的脑回路?
智源社区 2025-06-04T02:03:07.000000Z
Latent Adversarial Training (LAT) Improves the Representation of Refusal
少点错误 2025-01-06T13:34:27.000000Z
45 Shades of AI Safety: SORRY-Bench’s Innovative Taxonomy for LLM Refusal Behavior Analysis
MarkTechPost@AI 2024-07-02T21:16:44.000000Z