MarkTechPost@AI 2024年07月20日
EM-LLM: A Novel and Flexible Architecture that Integrates Key Aspects of Human Episodic Memory and Event Cognition into Transformer-based Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

EM-LLM 是一种新颖的架构,它将人类情景记忆和事件认知的关键方面整合到基于 Transformer 的大型语言模型 (LLM) 中,使其能够处理更长的上下文。它将上下文划分为初始标记、驱逐标记(由情景记忆模型管理)和本地上下文。该架构通过在推理期间根据意外程度将标记序列分割成事件来形成记忆,并使用图论指标来优化凝聚力和分离来细化边界。记忆检索采用两阶段机制:k-NN 搜索检索类似的事件,而邻近缓冲区维护时间上下文。这种方法模仿人类情景记忆,增强了模型处理扩展上下文和有效执行复杂时间推理任务的能力。

🤔 EM-LLM 是一种新颖的架构,它将人类情景记忆和事件认知的关键方面整合到基于 Transformer 的大型语言模型 (LLM) 中,使其能够处理更长的上下文。它将上下文划分为初始标记、驱逐标记(由情景记忆模型管理)和本地上下文。这种方法模仿人类情景记忆,增强了模型处理扩展上下文和有效执行复杂时间推理任务的能力。

🧠 EM-LLM 的核心是其情景记忆模型,它通过将标记序列分割成事件来形成记忆。这些事件通过意外程度(即对新信息的惊讶程度)进行分割,并使用图论指标进行优化,以确保事件的凝聚力和分离。

🔍 为了检索与当前任务相关的记忆,EM-LLM 使用两阶段机制:k-NN 搜索检索类似的事件,而邻近缓冲区维护时间上下文,确保模型能够考虑事件发生的顺序。

📈 研究结果表明,EM-LLM 在处理长上下文任务方面取得了显著的性能提升,例如在 LongBench 数据集上,它在所有任务中都超过了基线 InfLLM 模型,整体提升了 1.8 个百分点。此外,EM-LLM 在 PassageRetrieval 任务中也表现出色,取得了高达 33% 的提升,以及在 HotpotQA 任务中取得了 9.38% 的提升。

🚀 EM-LLM 的出现为大型语言模型处理扩展上下文提供了新的可能性,它有可能彻底改变 LLM 与持续的、个性化的交互。这种灵活的框架为传统的 RAG 技术提供了一种替代方案,并为测试人类记忆假设提供了一个可扩展的计算模型。

Despite their expanding capabilities, large language models (LLMs) need help with processing extensive contexts. These limitations stem from Transformer-based architectures struggling to extrapolate beyond their training window size. Processing long token sequences requires substantial computational resources and risks producing noisy attention embeddings. These constraints hinder LLMs’ ability to incorporate domain-specific, private, or up-to-date information effectively. Researchers have attempted various approaches, including retrieval-based methods, but a significant performance gap remains between short- and long-context tasks, even when employing existing long-context architectures.

Researchers have explored various approaches to extend LLMs’ context windows, focusing on improving softmax attention, reducing computational costs, and enhancing positional encodings. Retrieval-based methods, particularly group-based k-NN retrieval, have shown promise by retrieving large token groups and functioning as hierarchical attention.

Concurrently, research in neural models of episodic memory has provided insights into brain processes for storing experiences. These models highlight the importance of surprise-based event segmentation and temporal dynamics in memory formation and retrieval. Studies reveal that transformer-based LLMs exhibit temporal contiguity and asymmetry effects similar to human memory retrieval, suggesting potential for functioning as episodic memory retrieval models with appropriate context information.

Researchers from Huawei Noah’s Ark Lab and University College London propose a EM-LLM, a unique architecture integrating episodic memory into Transformer-based LLMs, enabling them to handle significantly longer contexts. It divides the context into initial tokens, evicted tokens (managed by an episodic memory model), and local context. The architecture forms memories by segmenting token sequences into events based on surprise levels during inference, refining boundaries using graph-theoretic metrics to optimize cohesion and separation. Memory retrieval employs a two-stage mechanism: k-NN search retrieves similar events, while a contiguity buffer maintains temporal context. This approach mimics human episodic memory, enhancing the model’s ability to process extended contexts and perform complex temporal reasoning tasks efficiently.

EM-LLM extends pre-trained LLMs to handle larger context lengths. It divides the context into initial tokens, evicted tokens, and local context. The local context uses full softmax attention, representing the most recent and relevant information. Evicted tokens, managed by a memory model similar to short-term episodic memory, comprise the majority of past tokens. Initial tokens act as attention sinks. For retrieved tokens outside the local context, EM-LLM assigns fixed position embeddings. This architecture allows EM-LLM to process information beyond its pre-trained context window while maintaining performance characteristics.

EM-LLM demonstrated improved performance on long-context tasks compared to the baseline InfLLM model. On the LongBench dataset, EM-LLM surpassed InfLLM in all but one task, achieving an overall increase of 1.8 percentage points (4.3% relative improvement). Also, EM-LLM showed significant gains on the PassageRetrieval task, with up to a 33% improvement, and a 9.38% improvement on the HotpotQA task. These results highlight EM-LLM’s enhanced ability to recall detailed information from large contexts and perform complex reasoning over multiple supporting documents. The study also found that surprise-based segmentation methods closely aligned with human event perception, outperforming fixed or random event segmentation approaches.

EM-LLM represents a significant advancement in language models with extended context-processing capabilities. By integrating human episodic memory and event cognition into transformer-based LLMs, it effectively processes information from vastly extended contexts without pre-training. The combination of surprise-based event segmentation, graph-theoretic boundary refinement, and two-stage memory retrieval enables superior performance on long-context tasks. EM-LLM offers a path towards virtually infinite context windows, potentially revolutionizing LLM interactions with continuous, personalized exchanges. This flexible framework serves as an alternative to traditional RAG techniques and provides a scalable computational model for testing human memory hypotheses. By bridging cognitive science and machine learning, EM-LLM not only enhances LLM performance but also inspires further research at the intersection of LLMs and human memory mechanisms.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post EM-LLM: A Novel and Flexible Architecture that Integrates Key Aspects of Human Episodic Memory and Event Cognition into Transformer-based Language Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

EM-LLM 大型语言模型 情景记忆 事件认知 上下文处理
相关文章