MarkTechPost@AI 2024年12月28日
Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌DeepMind提出了一种名为可微分缓存增强的新方法,通过训练协处理器来增强大型语言模型的键值缓存,从而提升模型的推理能力和效率。该方法在不增加计算负担的情况下,利用协处理器异步操作,为LLM提供额外的潜在嵌入,从而丰富模型的内部记忆。实验表明,该方法在多个基准测试中显著提升了模型的性能,尤其是在复杂的推理任务上,例如GSM8K数据集上准确率提高了10.05%。这种方法无需针对特定任务进行微调,展现了其鲁棒性和适应性。

🧠 核心创新:引入可微分缓存增强技术,利用协处理器异步地为LLM的键值缓存添加潜在嵌入,增强模型内部记忆。

⚙️ 三阶段流程:首先,冻结的LLM生成键值缓存;然后,协处理器处理该缓存,并生成潜在嵌入;最后,增强的缓存被送回LLM,生成更丰富的输出。

📈 性能提升:在GSM8K数据集上,使用64个潜在嵌入时,准确率提升了10.05%;在MMLU数据集上,性能提升了4.70%。同时,在多个token位置观察到困惑度降低。

💡 可扩展性:实验证明,该方法的有效性随着潜在嵌入数量的增加而提升,在GSM8K等数据集上,准确率逐步提高。

💪 鲁棒性:该方法无需针对特定任务进行微调,即可在多个基准测试中超越基线模型,展现了其强大的适应性。

Large language models (LLMs) are integral to solving complex problems across language processing, mathematics, and reasoning domains. Enhancements in computational techniques focus on enabling LLMs to process data more effectively, generating more accurate and contextually relevant responses. As these models become complex, researchers strive to develop methods to operate within fixed computational budgets without sacrificing performance.

One major challenge in optimizing LLMs is their inability to effectively reason across multiple tasks or perform computations beyond their pre-trained architecture. Current methods for improving model performance involve generating intermediate steps during task processing, often at the cost of increased latency and computational inefficiency. This limitation hampers their ability to perform complex reasoning tasks, particularly those requiring longer dependencies or higher accuracy in predictions.

Researchers have explored methods like Chain-of-Thought (CoT) prompting, which guides LLMs to reason step by step. While effective in some cases, CoT relies on sequential processing of intermediate reasoning steps, leading to slower computation times. KV-cache compression has also been proposed to reduce memory usage but does little to improve reasoning capabilities. These approaches, though valuable, underscore the need for a method that combines efficiency with enhanced reasoning ability.

Researchers from Google DeepMind have introduced a method called Differentiable Cache Augmentation. This technique uses a trained coprocessor to augment the LLM’s key-value (kv) cache with latent embeddings, enriching the model’s internal memory. The key innovation lies in keeping the base LLM frozen while training the coprocessor, which operates asynchronously. The researchers designed this method to enhance reasoning capabilities without increasing the computational burden during task execution.

The methodology revolves around a three-stage process. First, the frozen LLM generates a kv-cache from an input sequence, encapsulating its internal representation. This kv-cache is passed to the coprocessor, which processes it with additional trainable soft tokens. Not tied to specific words, these tokens act as abstract prompts for generating latent embeddings. Once processed, the augmented kv-cache is fed back into the LLM, enabling it to generate contextually enriched outputs. This asynchronous operation ensures the coprocessor’s enhancements are applied efficiently without delaying the LLM’s primary functions. Training the coprocessor is conducted using a language modeling loss, focusing solely on its parameters while preserving the integrity of the frozen LLM. This targeted approach allows for scalable and effective optimization.

Performance evaluations demonstrated significant improvements. The method was tested on the Gemma-2 2B model, achieving considerable results across various benchmarks. For instance, on the reasoning-intensive GSM8K dataset, accuracy improved by 10.05% when 64 latent embeddings were used. Similarly, MMLU performance increased by 4.70% under the same configuration. These enhancements underscore the model’s ability to perform better on complex reasoning tasks. Further, perplexity reductions were observed at multiple token positions. For example, perplexity decreased by 3.94% at position one and 1.20% at position 32 when 64 latent embeddings were applied, showcasing the model’s improved prediction capabilities over longer sequences.

Further analysis showed that the augmentation’s effectiveness scales with the number of latent embeddings. For GSM8K, accuracy rose incrementally with additional embeddings, from 1.29% with four embeddings to the peak improvement of 10.05% with 64 embeddings. Similar trends were observed in other benchmarks like ARC and MATH, indicating the broader applicability of this method. The researchers confirmed that their approach consistently outperformed baseline models without task-specific fine-tuning, demonstrating its robustness and adaptability.

This work represents a significant step forward in enhancing LLMs’ reasoning capabilities. By introducing an external coprocessor to augment the kv-cache, the researchers from Google DeepMind have created a method that improves performance while maintaining computational efficiency. The results highlight the potential for LLMs to tackle more complex tasks, paving the way for further exploration into modular enhancements and scalable reasoning systems. This breakthrough underscores the importance of continual innovation in AI to meet the growing demands of reasoning-intensive applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 可微分缓存增强 协处理器 推理能力 谷歌DeepMind
相关文章