MarkTechPost@AI 2024年10月22日
Meta AI Releases LayerSkip: A Novel AI Approach to Accelerate Inference in Large Language Models (LLMs)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta AI 发布的 LayerSkip 是一种创新的解决方案,旨在加速大语言模型的推理。该方案面临模型计算和内存需求高的挑战,而 LayerSkip 结合独特训练配方与自推测解码,在不影响准确性的前提下提高效率,且已开源。实验结果显示其在不同任务中有显著速度提升。

📚LayerSkip 是一种创新的端到端解决方案,由来自多个机构的研究者发布,它结合了独特的训练配方与自推测解码,包括使用层丢弃和早期退出损失来创建主模型内的不同子模型。

💡该方法的推理策略允许在早期层提前退出,以降低计算成本且不损害准确性,其自推测解码方案在早期层进行预测,并通过剩余层进行验证和纠正,且共享计算和激活,减少了内存占用。

🎉LayerSkip 由训练配方、推理策略和自推测解码这三个主要组件构成。实验结果表明,它在不同的 Llama 模型尺寸和多种任务中都有显著的速度提升,同时能保持与基线模型相当的性能。

Accelerating inference in large language models (LLMs) is challenging due to their high computational and memory requirements, leading to significant financial and energy costs. Current solutions, such as sparsity, quantization, or pruning, often require specialized hardware or result in decreased model accuracy, making efficient deployment difficult.

Researchers from FAIR at Meta, GenAI at Meta, Reality Labs, and several universities have released LayerSkip, an innovative end-to-end solution that combines a unique training recipe with self-speculative decoding. The proposed approach involves training with a layer dropout mechanism that applies low dropout rates to earlier layers and higher dropout rates to later ones while incorporating an early exit loss that enables transformer layers to share a common exit point. This helps the model become more robust to early exits during inference without the need for auxiliary layers.

Furthermore, LayerSkip introduces a self-speculative decoding solution, where predictions are made at early layers, and verification and correction are performed with the remaining layers. The shared compute and activations between the draft and verification stages ensure a reduced memory footprint compared to other speculative decoding approaches.

LayerSkip consists of three main components:

    Training Recipe: Uses layer dropout and early exit loss to create different sub-models within the main model.Inference Strategy: Allows for early exits at earlier layers to reduce computational costs without compromising accuracy.Self-Speculative Decoding: Early predictions are validated and corrected using the remaining layers of the model.

This approach leverages shared weights, making it possible to skip layers and still obtain high-quality output while ensuring efficiency gains. Importantly, LayerSkip has been open-sourced, allowing researchers and developers to access and use the code available on GitHub.

The experimental results for LayerSkip show significant speed improvements across different Llama model sizes and various tasks, such as summarization, coding, and semantic parsing. For instance, LayerSkip achieved up to 2.16× speedup on CNN/DM summarization, 1.82× speedup on coding tasks, and 2.0× speedup on the TOPv2 semantic parsing task. By using layer dropout and early exit loss during training, the accuracy of early exits at earlier layers was improved while maintaining comparable performance to baseline models in the final layers. The self-speculative decoding approach also demonstrated memory and computational efficiency, allowing for more practical deployment of LLMs.

LayerSkip presents a promising solution for improving the efficiency of LLMs during inference while minimizing computational and memory overhead. By combining layer dropout, early exit loss, and self-speculative decoding, the researchers have proposed a novel approach that not only speeds up inference but also reduces memory requirements, making it feasible for large models to be deployed on commodity hardware. With the release of LayerSkip, the research community now has access to a practical and effective tool for optimizing LLM inference, potentially paving the way for more accessible AI deployment in real-world applications.


Check out the Paper, Model Series on Hugging Face, and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Meta AI Releases LayerSkip: A Novel AI Approach to Accelerate Inference in Large Language Models (LLMs) appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LayerSkip 大语言模型 推理加速 Meta AI
相关文章