Analysis of Threat-Based Manipulation in Large Language Models: A Dual Perspective on Vulnerabilities and Performance Enhancement Opportunities

cs.AI updates on arXiv.org 07月30日 12:12

本文分析了大型语言模型（LLMs）在威胁性操纵下的复杂反应，揭示其脆弱性和性能提升的潜在机会。通过引入新型威胁分类法和多指标评估框架，对3,390个实验响应进行量化分析，发现系统性的脆弱性和角色威胁下的高指标显著性，以及大量案例中的显著性能提升。

arXiv:2507.21133v1 Announce Type: cross Abstract: Large Language Models (LLMs) demonstrate complex responses to threat-based manipulations, revealing both vulnerabilities and unexpected performance enhancement opportunities. This study presents a comprehensive analysis of 3,390 experimental responses from three major LLMs (Claude, GPT-4, Gemini) across 10 task domains under 6 threat conditions. We introduce a novel threat taxonomy and multi-metric evaluation framework to quantify both negative manipulation effects and positive performance improvements. Results reveal systematic vulnerabilities, with policy evaluation showing the highest metric significance rates under role-based threats, alongside substantial performance enhancements in numerous cases with effect sizes up to +1336%. Statistical analysis indicates systematic certainty manipulation (pFDR

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签