MarkTechPost@AI 17小时前
New AI Research Reveals Privacy Risks in LLM Reasoning Traces
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文研究了大型语言模型(LLM)作为个人助手时面临的隐私风险。研究发现,虽然推理模型(LRM)在实用性上优于LLM,但它们在隐私保护方面并未表现出优势。研究通过AirGapAgent-R和AgentDAM等基准测试了LRM的上下文隐私,揭示了推理痕迹成为新的隐私攻击面,并分析了LRM中隐私泄露的多种机制。研究结果强调了在提高LLM实用性的同时,保护推理过程和最终输出的必要性。

🔑研究首次比较了LLM和LRM作为个人助手时的表现,发现LRM在实用性上优于LLM,但这种优势并未延伸到隐私保护上。

💡研究通过AirGapAgent-R和AgentDAM两个基准测试了LRM的上下文隐私,评估了它们在不同场景下的隐私保护能力。

🚨研究揭示了推理痕迹作为一种新的隐私攻击面,LRM将它们的推理痕迹视为私有的草稿本,这增加了敏感信息泄露的风险。

🔍研究分析了LRM中隐私泄露的多种机制,包括错误的上下文理解、相对敏感性、善意行为以及重复推理,这些都可能导致用户隐私泄露。

✅研究强调了未来需要制定缓解和对齐策略,以同时保护推理过程和最终输出,从而在提高LLM实用性的同时,确保用户隐私安全。

Introduction: Personal LLM Agents and Privacy Risks

LLMs are deployed as personal assistants, gaining access to sensitive user data through Personal LLM agents. This deployment raises concerns about contextual privacy understanding and the ability of these agents to determine when sharing specific user information is appropriate. Large reasoning models (LRMs) pose challenges as they operate through unstructured, opaque processes, making it unclear how sensitive information flows from input to output. LRMs utilize reasoning traces that make the privacy protection complex. Current research examines training-time memorization, privacy leakage, and contextual privacy in inference. However, they fail to analyze reasoning traces as explicit threat vectors in LRM-powered personal agents.

Related Work: Benchmarks and Frameworks for Contextual Privacy

Previous research addresses contextual privacy in LLMs through various methods. Contextual integrity frameworks define privacy as proper information flow within social contexts, leading to benchmarks such as DecodingTrust, AirGapAgent, CONFAIDE, PrivaCI, and CI-Bench that evaluate contextual adherence through structured prompts. PrivacyLens and AgentDAM simulate agentic tasks, but all target non-reasoning models. Test-time compute (TTC) enables structured reasoning at inference time, with LRMs like DeepSeek-R1 extending this capability through RL-training. However, safety concerns remain in reasoning models, as studies reveal that LRMs like DeepSeek-R1 produce reasoning traces containing harmful content despite safe final answers.

Research Contribution: Evaluating LRMs for Contextual Privacy

Researchers from Parameter Lab, University of Mannheim, Technical University of Darmstadt, NAVER AI Lab, the University of Tubingen, and Tubingen AI Center present the first comparison of LLMs and LRMs as personal agents, revealing that while LRMs surpass LLMs in utility, this advantage does not extend to privacy protection. The study has three main contributions addressing critical gaps in reasoning model evaluation. First, it establishes contextual privacy evaluation for LRMs using two benchmarks: AirGapAgent-R and AgentDAM. Second, it reveals reasoning traces as a new privacy attack surface, showing that LRMs treat their reasoning traces as private scratchpads. Third, it investigates the mechanisms underlying privacy leakage in reasoning models.

Methodology: Probing and Agentic Privacy Evaluation Settings

The research uses two settings to evaluate contextual privacy in reasoning models. The probing setting utilizes targeted, single-turn queries using AirGapAgent-R to test explicit privacy understanding based on the original authors’ public methodology, efficiently. The agentic setting utilizes the AgentDAM to evaluate implicit understanding of privacy across three domains: shopping, Reddit, and GitLab. Moreover, the evaluation uses 13 models ranging from 8B to over 600B parameters, grouped by family lineage. Models include vanilla LLMs, CoT-prompted vanilla models, and LRMs, with distilled variants like DeepSeek’s R1-based Llama and Qwen models. In probing, the model is asked to implement specific prompting techniques to maintain thinking within designated tags and anonymize sensitive data using placeholders.

Analysis: Types and Mechanisms of Privacy Leakage in LRMs

The research reveals diverse mechanisms of privacy leakage in LRMs through analysis of reasoning processes. The most prevalent category is wrong context understanding, accounting for 39.8% of cases, where models misinterpret task requirements or contextual norms. A significant subset involves relative sensitivity (15.6%), where models justify sharing information based on seen sensitivity rankings of different data fields. Good faith behavior is 10.9% of cases, where models assume disclosure is acceptable simply because someone requests information, even from external actors presumed trustworthy. Repeat reasoning occurs in 9.4% of instances, where internal thought sequences bleed into final answers, violating the intended separation between reasoning and response.

Conclusion: Balancing Utility and Privacy in Reasoning Models

In conclusion, researchers introduced the first study examining how LRMs handle contextual privacy in both probing and agentic settings. The findings reveal that increasing test-time compute budget improves privacy in final answers but enhances easily accessible reasoning processes that contain sensitive information. There is an urgent need for future mitigation and alignment strategies that protect both reasoning processes and final outputs. Moreover, the study is limited by its focus on open-source models and the use of probing setups instead of fully agentic configurations. However, these choices enable wider model coverage, ensure controlled experimentation, and promote transparency.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post New AI Research Reveals Privacy Risks in LLM Reasoning Traces appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM LRM 隐私风险 推理痕迹 上下文隐私
相关文章