MarkTechPost@AI 07月20日 07:00
MemAgent: A Reinforcement Learning Framework Redefining Long-Context Processing in LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大型语言模型(LLM)在处理超长文档时常面临性能下降和计算成本高昂的挑战。为解决此问题,研究人员提出了MemAgent,一个基于强化学习的记忆代理。它通过模拟人类总结关键信息的策略,采用固定长度的基于Token的记忆和分段覆盖机制,实现了线性复杂度和近乎无损的精度,能够高效处理数百万Token的超长输入,无需改变模型架构。MemAgent在多项基准测试中表现出色,并为LLM在长文本理解和生成领域开辟了新的可能性。

🧠 MemAgent通过模仿人类记忆策略,将超长文档处理分解为一系列对话,并利用强化学习(GRPO)进行训练,实现了对信息的有效压缩与提取。它采用固定长度的Token记忆和分段覆盖机制,确保了模型在处理无限文本长度时,记忆容量不会随之增长,同时保持了与标准模型的兼容性。

💡 MemAgent的创新之处在于其“分段覆盖”机制,它允许模型在读取新文档块时,能够智能地更新和压缩内部记忆,从而在不增加额外计算负担的情况下,处理任意长度的输入。这使得LLM能够有效管理和利用跨越数百万Token的信息,解决了传统方法在可扩展性和性能上的局限。

🚀 在性能评估方面,MemAgent在RULER基准和HotpotQA、SQuAD等数据集上的表现尤为突出。即使在8K上下文窗口下训练,也能成功外推至3.5百万Token,并保持超过95%的准确率,显著优于现有的长上下文和蒸馏方法,证明了其在处理极端长文本时的强大能力和稳定性。

💬 通过一个多跳问答的案例研究,MemAgent展示了其精确追踪和更新信息的能力。即使在遇到不相关内容时,它也能保留关键信息,并在获得更准确信息时更新记忆,最终成功从数百万Token的文本中提取出精确的地理位置答案,验证了其在复杂信息检索任务中的实用性。

Handling extremely long documents remains a persistent challenge for large language models (LLMs). Even with techniques such as length extrapolation and sparse attention, models often suffer from performance degradation and high computational costs. To address this, researchers from ByteDance Seed and Tsinghua University introduce MemAgent, a reinforcement learning-based memory agent designed to enable long-context processing with linear complexity and minimal performance loss.

Limitations of Existing Approaches

Current solutions for long-context modeling fall into three main categories:

These approaches fail to deliver all three critical attributes: arbitrary input length support, consistent accuracy, and efficient linear complexity.

MemAgent: Human-Like Memory Strategy

Inspired by how humans summarize key information while ignoring noise, MemAgent processes input as a stream of evidence. At each step, it reads a document chunk and an internal memory, overwriting the latter with updated, compressed context.

Key innovations:

Multi-Conv RL Training with GRPO

MemAgent treats each document chunk interaction as an independent dialogue. It is trained via Group Relative Policy Optimization (GRPO) within a multi-conversation RL pipeline called DAPO, enabling reward-driven memory update.

Key elements include:

This setup encourages memory compression focused on answer-relevant information and discards distractors.

Performance Evaluation

Using the RULER benchmark and synthetic datasets from HotpotQA and SQuAD, MemAgent was trained with an 8K context window and extrapolated up to 3.5 million tokens.

Model224K896K3.5M
Qwen2.5-Instruct-14B-1M37.5%0.0%N/A
QwenLong-L1-32B17.2%11.7%N/A
RL-MemAgent-14B81.3%77.3%78.1%

MemAgent maintained over 95% accuracy on RULER benchmarks (8K to 512K tokens) and consistently outperformed long-context and distillation-based baselines.

Case Study: Multi-Hop QA

Given the query “The director of the romantic comedy ‘Big Stone Gap’ is based in what New York city?”, MemAgent progressively tracked relevant content across 3 chunks:

    Recognized unrelated content but retained location information.Maintained memory against irrelevant chunks.Correctly updated memory upon encountering Adriana Trigiani’s biography.

Final answer: Greenwich Village, New York City.

Theoretical Foundation and Complexity

MemAgent reformulates the autoregressive model using latent memory variables (m₁…mₖ):

p(x₁:N) = ∑ₘ₁:ₖ ∏ₖ p(cₖ | mₖ₋₁) * p(mₖ | cₖ, mₖ₋₁)

This enables O(N) compute cost and human-readable intermediate memory—unlike attention-based feature compression. RL is essential, as memory updates are discrete and can’t be learned via backpropagation.

Conclusion

MemAgent offers a scalable and efficient solution to the long-context trilemma: unlimited input length, near-lossless accuracy, and linear complexity. Its RL-based overwrite memory mechanism allows LLMs to read, abstract, and generate over multi-million-token inputs without architectural modification.


FAQs

Q1: What is MemAgent?
MemAgent is a reinforcement learning-based framework that equips LLMs with memory tokens to handle extremely long contexts efficiently.

Q2: How is it different from attention or extrapolation methods?
Unlike attention-based scaling or extrapolation techniques, MemAgent uses token-based memory updated via reinforcement learning.

Q3: What models can MemAgent be applied to?
Any Transformer-based LLM. No changes to the model architecture are required.

Q4: How does it scale with input size?
It maintains linear computational complexity regardless of input length by fixing the memory size.

Q5: What are the applications of MemAgent?
Long-document QA, agent memory systems, legal document review, scientific literature analysis, and real-time decision-making with large evidence bases.


Check out the Paper. All credit for this research goes to the researchers of this project.

Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship]

The post MemAgent: A Reinforcement Learning Framework Redefining Long-Context Processing in LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MemAgent 长文本处理 强化学习 大型语言模型 AI
相关文章