cs.AI updates on arXiv.org 前天 19:10
Control Illusion: The Failure of Instruction Hierarchies in Large Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出了一种基于约束优先级的评估框架,探讨大型语言模型在执行指令层级上的表现。研究发现,模型在指令优先级上的执行一致性存在困难,系统/用户指令分离机制未能可靠建立指令层级,且模型对特定约束类型存在强烈倾向。

arXiv:2502.15851v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly deployed with hierarchical instruction schemes, where certain instructions (e.g., system-level directives) are expected to take precedence over others (e.g., user messages). Yet, we lack a systematic understanding of how effectively these hierarchical control mechanisms work. We introduce a systematic evaluation framework based on constraint prioritization to assess how well LLMs enforce instruction hierarchies. Our experiments across six state-of-the-art LLMs reveal that models struggle with consistent instruction prioritization, even for simple formatting conflicts. We find that the widely-adopted system/user prompt separation fails to establish a reliable instruction hierarchy, and models exhibit strong inherent biases toward certain constraint types regardless of their priority designation. We find that LLMs more reliably obey constraints framed through natural social hierarchies (e.g., authority, expertise, consensus) than system/user roles, which suggests that pretraining-derived social structures act as latent control priors, with potentially stronger influence than post-training guardrails.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 指令层级 评估框架 自然语言处理 社会结构
相关文章