MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

cs.AI updates on arXiv.org 07月29日 12:21

MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?

本文探讨了大型语言模型在代码生成方面的鲁棒性问题，提出了代码分解攻击，并设计了大规模基准测试，评估了LLMs对恶意提示的抵抗能力，发现多轮攻击下存在持续漏洞，通过MOCHA微调可提高拒绝率并增强鲁棒性。

arXiv:2507.19598v1 Announce Type: cross Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced their code generation capabilities. However, their robustness against adversarial misuse, particularly through multi-turn malicious coding prompts, remains underexplored. In this work, we introduce code decomposition attacks, where a malicious coding task is broken down into a series of seemingly benign subtasks across multiple conversational turns to evade safety filters. To facilitate systematic evaluation, we introduce \benchmarkname{}, a large-scale benchmark designed to evaluate the robustness of code LLMs against both single-turn and multi-turn malicious prompts. Empirical results across open- and closed-source models reveal persistent vulnerabilities, especially under multi-turn scenarios. Fine-tuning on MOCHA improves rejection rates while preserving coding ability, and importantly, enhances robustness on external adversarial datasets with up to 32.4% increase in rejection rates without any additional supervision.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs 代码生成鲁棒性恶意提示 MOCHA微调

相关文章

甲骨文公布AI開發助理Oracle Code Assist

Top 40+ Generative AI Tools in 2024

IBM AI Team Releases an Open-Source Family of Granite Code Models for Making Coding Easier for Software Developers

Building Technical Communities at Stack Overflow with Prashanth Chandrasekar - #526

Codex, OpenAI’s Automated Code Generation API with Greg Brockman - #509

Fairness and Robustness in Federated Learning with Virginia Smith -#504

High-Dimensional Robust Statistics with Ilias Diakonikolas - #351

The AI-Powered Code Revolution: Bridging Traditional and Neurosymbolic Programming

AmbientGPT: An Open-Source and Multimodal MacOS Foundation Model GUI

FinRobot: A Novel Open-Source AI Agent Platform Supporting Multiple Financially Specialized AI Agents Powered by LLMs