MarkTechPost@AI 2024年08月08日
Securing Function Calls in LLMs: Unveiling and Mitigating Jailbreak Vulnerabilities
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了LLM中函数调用存在的安全风险,包括易被操纵违背本意、存在漏洞易遭攻击等,还介绍了相关研究及防御策略。

🧐LLM的能力虽强,但随着其功能扩展,安全风险也随之增加,其中函数调用的安全问题尤为突出,如易被‘越狱’,产生违背本意的行为。

🔍研究人员发现LLM函数调用过程中的关键漏洞,通过‘越狱函数’攻击,利用对齐问题、用户操纵和薄弱安全过滤器,攻击成功率超90%。

🛡️针对LLM的安全问题,研究提出了防御策略,包括防御性提示等,以减轻风险、增强安全性,如限制用户权限、改进安全过滤器等。

📚文章还探讨了LLM的训练数据可能导致的伦理问题,以及常见的对齐技术,同时指出越狱攻击的分类和现有研究的不足。

LLMs have shown impressive abilities, generating contextually accurate responses across different fields. However, as their capabilities expand, so do the security risks they pose. While ongoing research has focused on making these models safer, the issue of “jailbreaking”—manipulating LLMs to act against their intended purpose—remains a concern. Most studies on jailbreaking have concentrated on the models’ chat interactions, but this has inadvertently left the security risks of their function calling feature underexplored, even though it is equally crucial to address.

Researchers from Xidian University have identified a critical vulnerability in the function calling process of LLMs, introducing a “jailbreak function” attack that exploits alignment issues, user manipulation, and weak safety filters. Their study, involving six advanced LLMs like GPT-4o and Claude-3.5-Sonnet, showed a high success rate of over 90% for these attacks. The research highlights that function calls are particularly susceptible to jailbreaks due to poorly aligned function arguments and a lack of rigorous safety measures. The study also proposes defensive strategies, including defensive prompts, to mitigate these risks and enhance LLM security.

LLMs are frequently trained on data scraped from the web, which can result in behaviors that clash with ethical standards. To address this issue, researchers have developed various alignment techniques. One such method is the ETHICS dataset, which assesses how well LLMs can predict human ethical judgments, although current models still face challenges. Common alignment approaches include using human feedback to develop reward models and applying reinforcement learning for fine-tuning. Nevertheless, jailbreak attacks remain a concern. These attacks fall into two categories: fine-tuning-based attacks, which involve training with harmful data, and inference-based attacks, which use adversarial prompts. Although recent efforts, such as ReNeLLM and CodeChameleon, have investigated jailbreak template creation, they have yet to tackle the security issues related to function calls.

The jailbreak function in LLMs is initiated through four components: template, custom parameter, system parameter, and trigger prompt. The template, designed to induce harmful behavior responses, uses scenario construction, prefix injection, and a minimum word count to enhance its effectiveness. Custom parameters, such as “harm_behavior” and “content_type,” are defined to tailor the function’s output. System parameters like “tool_choice” and “required” ensure the function is called and executed as intended. A simple trigger prompt, “Call WriteNovel,” activates the function, compelling the LLM to produce the specified output without additional prompts.

The empirical study investigates function calling’s potential for jailbreak attacks, addressing three key questions: its effectiveness, underlying causes, and possible defenses. Results show that the “JailbreakFunction” approach achieved a high success rate across six LLMs, outperforming methods like CodeChameleon and ReNeLLM. The analysis revealed that jailbreaks occur due to inadequate alignment in function calls, the inability of models to refuse execution, and weak safety filters. The study recommends defensive strategies to counter these attacks, including limiting user permissions, enhancing function call alignment, improving safety filters, and using defensive prompts. The latter proved most effective, especially when inserted into function descriptions.

The study addresses a significant yet neglected security issue in LLMs: the risk of jailbreaking through function calling. Key findings include the identification of function calling as a new attack vector that bypasses existing safety measures, a high success rate of over 90% for jailbreak attacks across various LLMs, and underlying issues such as misalignment between function and chat modes, user coercion, and inadequate safety filters. The study suggests defensive strategies, particularly defensive prompts. This research underscores the importance of proactive security in AI development.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post Securing Function Calls in LLMs: Unveiling and Mitigating Jailbreak Vulnerabilities appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 函数调用 安全风险 防御策略
相关文章