SPARE: Training-Free Representation Engineering for Managing Knowledge Conflicts in Large Language Models

MarkTechPost@AI 2024年10月28日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

大型语言模型（LLM）在处理知识密集型任务方面表现出令人印象深刻的能力，它们通过存储在模型参数中的参数化知识来实现。然而，存储的知识可能会变得不准确或过时，导致采用检索和工具增强方法来提供外部上下文知识。当这种上下文知识与模型的参数化知识发生冲突时，就会出现一个关键挑战，导致出现不希望的行为和错误的输出。LLM 倾向于选择上下文知识而不是参数化知识，但在冲突期间，现有的需要额外模型交互的解决方案会导致高延迟时间，这使得它们不适合现实世界中的应用。为了理解和控制 LLM 的行为，现有的方法遵循了几个关键方向，包括表示工程、知识冲突和稀疏自动编码器 (SAE)。表示工程作为理解 LLM 行为的更高层次框架而出现。它包括机械可解释性，它分析单个网络组件（如电路和神经元），但在处理复杂现象时存在困难。此外，存在三种类型的知识冲突：上下文间冲突、上下文-记忆冲突和记忆内冲突。此外，SAE 已被开发为事后分析工具，用于识别 LLM 表示中的解缠结特征，在识别稀疏电路和通过单义特征实现受控文本生成方面显示出希望。来自爱丁堡大学、香港中文大学、罗马大学、伦敦大学学院和 Miniml.AI 的研究人员提出了 SPARE（基于稀疏自动编码器的表示工程），这是一种新颖的无训练表示工程方法。该方法利用预训练的稀疏自动编码器来控制 LLM 中的知识选择行为。它通过识别控制知识选择的函数特征并在推理期间编辑内部激活，有效地解决了开放域问答任务中的知识冲突。SPARE 的性能优于现有的表示工程方法 10%，优于对比解码方法 15%。

🤔 SPARE 是一种训练免费表示工程方法，利用预训练的稀疏自动编码器 (SAE) 来控制大型语言模型 (LLM) 中的知识选择行为。

🤖 SPARE 通过识别控制知识选择的函数特征并在推理期间编辑内部激活，有效地解决了开放域问答任务中的知识冲突。

📈 SPARE 的性能优于现有的表示工程方法 10%，优于对比解码方法 15%，并能够在不增加计算开销的情况下提高知识选择准确性。

🤔 SPARE 依赖于预训练的 SAE，目前重点关注特定的开放域问答任务，这限制了其应用范围。

🚀 SPARE 的高效性和有效性使其成为管理实际 LLM 应用中知识冲突的有希望的解决方案。

🧠 SPARE 能够添加和删除特定功能特征，从而对两种类型的知识进行更精确的控制。

🔍 SPARE 在控制上下文和参数化知识的使用方面优于现有的方法，包括 TaskVec、ActAdd 和 SEA。

📊 SPARE 优于对比解码策略（如 DoLa 和 CAD），这些策略通过增强上下文知识的使用来证明其有效性，但它们在参数化知识控制方面面临挑战。

💡 SPARE 优于非推理时间控制方法（如 ICL），突出了其效率和有效性。

🏆 SPARE 的有效性通过使用多个模型进行评估，包括 Llama3-8B、Gemma2-9B（使用公开的预训练 SAE）和 Llama2-7B（使用自定义的预训练 SAE）。

📚 SPARE 在两个突出的开放域问答数据集（具有知识冲突）上进行了测试：NQSwap 和 Macnoise。

🚀 SPARE 的结果强调了该方法在需要实时控制 LLM 行为的实际应用中的潜力。

🚀 SPARE 通过检查模型的残差流并实施训练免费表示工程来解决 LLM 中的上下文-记忆知识冲突问题。

🚀 SPARE 的有效性在于它能够在不增加计算开销的情况下控制知识选择行为，这代表了 LLM 知识管理方面的重大进步。

🚀 SPARE 的有效性通过使用多个模型进行评估，包括 Llama3-8B、Gemma2-9B（使用公开的预训练 SAE）和 Llama2-7B（使用自定义的预训练 SAE）。

📚 SPARE 在两个突出的开放域问答数据集（具有知识冲突）上进行了测试：NQSwap 和 Macnoise。

🚀 SPARE 的结果强调了该方法在需要实时控制 LLM 行为的实际应用中的潜力。

Large Language Models (LLMs) have demonstrated impressive capabilities in handling knowledge-intensive tasks through their parametric knowledge stored within model parameters. However, the stored knowledge can become inaccurate or outdated, leading to the adoption of retrieval and tool-augmented methods that provide external contextual knowledge. A critical challenge emerges when this contextual knowledge conflicts with the model’s parametric knowledge, causing undesired behaviors and incorrect outputs. LLMs prefer contextual knowledge over their parametric knowledge, but during conflicts, existing solutions that need additional model interactions result in high latency times, making them impractical for real-world applications.

Existing methods to understand and control LLM behavior have followed several key directions, including Representation engineering, Knowledge Conflicts, and Sparse Auto-Encoder (SAEs). Representation engineering emerged as a higher-level framework for understanding LLM behavior at scale. It includes Mechanistic interpretability that analyzes individual network components like circuits and neurons but struggles with complex phenomena. Further, there are three types of knowledge conflicts: inter-context, context-memory, and intra-memory conflicts. Moreover, SAEs have been developed as post-hoc analysis tools to identify disentangled features within LLM representations, showing promise in identifying sparse circuits and enabling controlled text generation through monosemantic features.

Researchers from the University of Edinburgh, The Chinese University of Hong Kong, Sapienza University of Rome, University College London, and Miniml.AI have proposed SPARE (Sparse Auto-Encoder-based Representation Engineering), a novel training-free representation engineering method. The method utilizes pre-trained sparse auto-encoders to control knowledge selection behavior in LLMs. It effectively resolves knowledge conflicts in open-domain question-answering tasks by identifying functional features that govern knowledge selection and editing internal activations during inference. SPARE outperforms existing representation engineering methods by 10% and contrastive decoding methods by 15%.

SPARE’s effectiveness is evaluated using multiple models, including Llama3-8B, Gemma2-9B with public pre-trained SAEs, and Llama2-7B with custom pre-trained SAEs. The method is tested on two prominent open-domain question-answering datasets featuring knowledge conflicts: NQSwap and Macnoise. The evaluation uses greedy decoding for open-ended generation settings. Performance comparisons are conducted against various inference-time representation engineering methods, including TaskVec, ActAdd, SEA (both linear and non-linear versions), and contrastive decoding methods like DoLa and CAD. Moreover, researchers also compared using in-context learning (ICL) to steer the knowledge selection.

SPARE outperforms existing representation engineering methods TaskVec, ActAdd, and SEA, showing superior performance in controlling both contextual and parametric knowledge usage compared to existing methods. Also, it outperforms Contrastive decoding strategies like DoLa and CAD that demonstrate effectiveness by enhancing contextual knowledge use but they face challenges with parametric knowledge control. SPARE’s ability to add and remove specific functional features results in more precise control over both knowledge types. Further, SPARE outperforms non-inference-time controlling approaches like ICL, highlighting its efficiency and effectiveness. These results underscore SPARE’s potential for practical applications requiring real-time control over LLM behavior.

In conclusion, researchers introduced SPARE which addresses the challenge of context-memory knowledge conflicts in LLMs by examining the model’s residual stream and implementing training-free representation engineering. The method’s effectiveness in controlling knowledge selection behavior without computational overhead represents a significant advancement in LLM knowledge management. However, some limitations exist, including the method’s dependency on pre-trained SAEs and the current focus on specific ODQA tasks. Despite these constraints, SPARE’s ability to enhance knowledge selection accuracy while maintaining efficiency makes it a promising solution for managing knowledge conflicts in practical LLM applications.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post SPARE: Training-Free Representation Engineering for Managing Knowledge Conflicts in Large Language Models appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签