MarkTechPost@AI 03月21日
KBLAM: Efficient Knowledge Base Augmentation for Large Language Models Without Retrieval Overhead
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

KBLAM是一种创新的方法,旨在通过将知识库(KB)无缝集成到大语言模型(LLM)的注意力层中来增强其能力。与传统的检索增强生成(RAG)和上下文学习方法不同,KBLAM将KB条目编码为连续的键值向量对,并利用专门的注意力机制将其集成到LLM中。这种方法无需外部检索,且能线性扩展,在保持解释性和动态知识更新的同时,有效整合了大量知识,并在问答和推理任务中展现出优异性能。

💡KBLAM的核心在于将知识库(KB)信息转化为键值向量对,并直接嵌入到大语言模型的注意力层中,从而实现高效的知识整合。

🔑 KBLAM利用预训练的句子编码器和线性适配器,将KB三元组转换为连续的键值嵌入,这些嵌入被称为知识令牌,并被整合到每个注意力层中,实现高效检索。

🚀 与RAG和上下文学习相比,KBLAM无需外部检索器,且KB规模线性扩展,支持动态更新,减少幻觉,提高了可扩展性,并保持了模型的原始能力。

✅ KBLAM通过指令调整在合成数据上进行训练,提高了可靠性,当缺乏相关知识时,拒绝回答,减少了幻觉,增强了可扩展性,并在实验中展现出强大的检索准确性。

📈 实验结果表明,KBLAM在知识检索和推理方面表现出色,其性能可与上下文学习相媲美,同时显著降低了内存使用,并保持了可扩展性,最高可达10K个三元组。

LLMs have demonstrated strong reasoning and knowledge capabilities, yet they often require external knowledge augmentation when their internal representations lack specific details. One method for incorporating new information is supervised fine-tuning, where models are trained on additional datasets to update their weights. However, this approach is inefficient as it requires retraining whenever new knowledge is introduced and may lead to catastrophic forgetting, degrading the model’s performance on general tasks. To overcome these limitations, alternative techniques that preserve the model’s weights have gained popularity. RAG is one approach that retrieves relevant knowledge from unstructured text and appends it to the input query before passing it through the model. By dynamically retrieving information, RAG enables LLMs to access large knowledge bases while maintaining a smaller context size. However, as long-context models such as GPT-4 and Gemini have emerged, researchers have explored in-context learning, where external knowledge is directly provided in the model’s input. This eliminates the need for retrieval but comes with computational challenges, as processing long contexts requires significantly more memory and time.

Several advanced techniques have been developed to enhance LLMs’ ability to integrate external knowledge more efficiently. Structured attention mechanisms improve memory efficiency by segmenting the context into independent sections, reducing the computational load of self-attention. Key-value (KV) caching optimizes response generation by storing precomputed embeddings at different layers, allowing the model to recall relevant information without recalculating it. This reduces the complexity from quadratic to linear concerning context length. Unlike traditional KV caching, which requires full recomputation when the input changes, newer methods allow selective updates, making external knowledge integration more flexible. 

Researchers from Johns Hopkins University and Microsoft propose a Knowledge Base Augmented Language Model (KBLAM), a method for integrating external knowledge into LLMs. KBLAM converts structured knowledge base (KB) triples into key-value vector pairs, seamlessly embedding them within the LLM’s attention layers. Unlike RAG, it eliminates external retrievers, and unlike in-context learning, it scales linearly with KB size. KBLAM enables efficient dynamic updates without retraining and enhances interpretability. Trained using instruction tuning on synthetic data, it improves reliability by refusing to answer when relevant knowledge is absent, reducing hallucinations and enhancing scalability.

KBLAM enhances LLMs by integrating a KB through two steps. First, each KB triple is converted into continuous key-value embeddings, termed knowledge tokens, using a pre-trained sentence encoder and linear adapters. These tokens are then incorporated into each attention layer via a rectangular attention structure, allowing efficient retrieval without altering the LLM’s core parameters. This method ensures scalability, mitigates positional bias and maintains reasoning abilities. Additionally, instruction tuning optimizes knowledge token projection without modifying the LLM, using a synthetic KB to prevent memorization. This approach efficiently integrates large KBs while preserving the model’s original capabilities.

The empirical evaluation of KBLAM demonstrates its effectiveness as a knowledge retrieval and reasoning model. After instruction tuning, its attention matrix exhibits interpretable patterns, allowing accurate retrieval. KBLAM achieves performance comparable to in-context learning while significantly reducing memory usage and maintaining scalability up to 10K triples. It can also refuse to answer when no relevant knowledge is found, with “over-refusal” occurring later than in-context learning. The model is trained on an instruction-tuned Llama3-8B and optimized using AdamW. Evaluation of synthetic and Enron datasets confirms KBLAM’s strong retrieval accuracy, efficient knowledge integration, and ability to minimize hallucinations.

In conclusion, KBLAM is an approach for enhancing LLMs with external KBs. It encodes KB entries as continuous key-value vector pairs using pre-trained sentence encoders with linear adapters and integrates them into LLMs through a specialized attention mechanism. Unlike Retrieval-Augmented Generation, KBLAM removes external retrieval modules, and unlike in-context learning, it scales linearly with KB size. This enables efficient integration of over 10K triples into an 8B LLM within an 8K context window on a single A100 GPU. Experiments show its effectiveness in question-answering and reasoning tasks while maintaining interpretability and enabling dynamic knowledge updates.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post KBLAM: Efficient Knowledge Base Augmentation for Large Language Models Without Retrieval Overhead appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

KBLAM 大语言模型 知识库 知识增强 注意力机制
相关文章