MarkTechPost@AI 03月27日 09:17
Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind的研究人员提出CaMeL,一种强大的防御系统,旨在保护大型语言模型(LLM)免受提示注入攻击。CaMeL通过在LLM周围创建一个保护层,即使底层模型易受攻击也能确保安全。与需要重新训练或修改模型的传统方法不同,CaMeL采用了源自软件安全实践的新范式。它从用户查询中提取控制流和数据流,防止不受信任的输入直接影响程序逻辑。这种设计隔离了潜在有害数据,阻止其影响LLM代理的决策过程。实验结果表明,CaMeL在AgentDojo基准测试中成功阻止了提示注入攻击,提供了近乎完全的保护。

🛡️ CaMeL 是一种针对大型语言模型(LLM)的防御系统,旨在抵御提示注入攻击,即使底层模型容易受到攻击也能提供保护。

⚙️ CaMeL 采用双模型架构:特权LLM和隔离LLM。特权LLM负责整体任务,隔离敏感操作;隔离LLM单独处理数据,并被剥夺工具调用能力以限制潜在损害。

🔑 CaMeL 通过为每个数据值分配元数据或“能力”,定义严格的策略来加强安全性。自定义Python解释器执行这些细粒度的安全策略,监控数据来源,并通过显式控制流约束来确保合规性。

✅ 在 AgentDojo 评估中,CaMeL 成功阻止了提示注入攻击,在AgentDojo框架内安全地解决了67%的任务。与“Prompt Sandwiching”和“Spotlighting”等其他防御措施相比,CaMeL 在安全性方面表现出色,提供了近乎完全的保护。

💡 CaMeL 还解决了微妙的漏洞,例如数据到控制流的操纵,通过其基于元数据的策略严格管理依赖关系,有效缓解了利用来自电子邮件数据的指令来控制系统执行流程的攻击。

Large Language Models (LLMs) are becoming integral to modern technology, driving agentic systems that interact dynamically with external environments. Despite their impressive capabilities, LLMs are highly vulnerable to prompt injection attacks. These attacks occur when adversaries inject malicious instructions through untrusted data sources, aiming to compromise the system by extracting sensitive data or executing harmful operations. Traditional security methods, such as model training and prompt engineering, have shown limited effectiveness, underscoring the urgent need for robust defenses.

Google DeepMind Researchers propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. Unlike traditional approaches that require retraining or model modifications, CaMeL introduces a new paradigm inspired by proven software security practices. It explicitly extracts control and data flows from user queries, ensuring untrusted inputs never alter program logic directly. This design isolates potentially harmful data, preventing it from influencing the decision-making processes inherent to LLM agents.

Technically, CaMeL functions by employing a dual-model architecture: a Privileged LLM and a Quarantined LLM. The Privileged LLM orchestrates the overall task, isolating sensitive operations from potentially harmful data. The Quarantined LLM processes data separately and is explicitly stripped of tool-calling capabilities to limit potential damage. CaMeL further strengthens security by assigning metadata or “capabilities” to each data value, defining strict policies about how each piece of information can be utilized. A custom Python interpreter enforces these fine-grained security policies, monitoring data provenance and ensuring compliance through explicit control-flow constraints.

Results from empirical evaluation using the AgentDojo benchmark highlight CaMeL’s effectiveness. In controlled tests, CaMeL successfully thwarted prompt injection attacks by enforcing security policies at granular levels. The system demonstrated the ability to maintain functionality, solving 67% of tasks securely within the AgentDojo framework. Compared to other defenses like “Prompt Sandwiching” and “Spotlighting,” CaMeL outperformed significantly in terms of security, providing near-total protection against attacks while incurring moderate overheads. The overhead primarily manifests in token usage, with approximately a 2.82× increase in input tokens and a 2.73× increase in output tokens, acceptable considering the security guarantees provided.

Moreover, CaMeL addresses subtle vulnerabilities, such as data-to-control flow manipulations, by strictly managing dependencies through its metadata-based policies. For instance, a scenario where an adversary attempts to leverage benign-looking instructions from email data to control the system execution flow would be mitigated effectively by CaMeL’s rigorous data tagging and policy enforcement mechanisms. This comprehensive protection is essential, given that conventional methods might fail to recognize such indirect manipulation threats.

In conclusion, CaMeL represents a significant advancement in securing LLM-driven agentic systems. Its ability to robustly enforce security policies without altering the underlying LLM offers a powerful and flexible approach to defending against prompt injection attacks. By adopting principles from traditional software security, CaMeL not only mitigates explicit prompt injection risks but also safeguards against sophisticated attacks leveraging indirect data manipulation. As LLM integration expands into sensitive applications, adopting CaMeL could be vital in maintaining user trust and ensuring secure interactions within complex digital ecosystems.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to Attacks appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CaMeL 大型语言模型 提示注入攻击 安全
相关文章