MarkTechPost@AI 01月14日
Apple Researchers Introduce Instruction-Following Pruning (IFPruning): A Dynamic AI Approach to Efficient and Scalable LLM Optimization
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

苹果AI和加州大学圣塔芭芭拉分校的研究人员提出了一种名为IFPruning的动态剪枝技术,旨在优化大型语言模型(LLM)。与传统的静态剪枝方法不同,IFPruning根据输入动态生成剪枝掩码,仅保留与特定任务最相关的参数。这种方法主要针对前馈神经网络(FFN)层,使模型在适应不同任务的同时,有效降低计算需求。通过两阶段训练过程,IFPruning在多个基准测试中表现出色,特别是在编码和数学任务中,显著提高了模型精度,并展现出良好的多任务处理能力。这项技术为资源受限设备上部署高效LLM提供了新的解决方案。

🚀IFPruning是一种动态剪枝技术,它根据输入动态生成剪枝掩码,选择与特定任务最相关的参数,从而优化LLM的效率。

⚙️该方法通过两阶段训练过程实现:首先在大规模数据上预训练模型,优化稀疏性预测器和LLM;其次在监督微调数据集上训练,动态生成剪枝掩码,无需重新加载参数。

🎯IFPruning在多个基准测试中表现优异,例如,将一个90亿参数的模型剪枝到30亿参数后,在编码任务上的准确率提高了8%,在数学数据集上提高了5%,同时在指令遵循评估中也取得了4-6个百分点的提升。

Large language models (LLMs) have become crucial tools for applications in natural language processing, computational mathematics, and programming. Such models often require large-scale computational resources to execute inference and train the model efficiently. To reduce this, many researchers have devised ways to optimize the techniques used with these models.

A strong challenge in LLM optimization arises from the fact that traditional pruning methods are fixed. Static Pruning removes unnecessary parameters based on a prespecified mask. They cannot be applied if the required skill for an application is coding or solving mathematical problems. These methods lack flexibility, as the performance is usually not maintained for several tasks while optimizing the computational resources.

Historically, techniques such as static structured Pruning and mixture-of-experts (MoE) architectures have been used to counter the computational inefficiencies of LLMs. Structured Pruning removes components like channels or attention heads from specific layers. Although these methods are hardware-friendly, they require full retraining to avoid a loss of model accuracy. MoE models, in turn, activate parts of the model during inference but incur huge overheads from frequent parameter reloading.

Apple AI and UC Santa Barbara researchers have introduced a new technique called Instruction-Following Pruning (IFPruning), which dynamically adapts LLMs to the needs of a particular task. IFPruning uses a sparsity predictor that generates input-dependent pruning masks, selecting only the most relevant parameters for a given task. Unlike traditional methods, this dynamic approach focuses on feed-forward neural network (FFN) layers, allowing the model to adapt to diverse tasks while reducing computational demands efficiently.

The researchers propose a two-stage training process for IFPruning: First, continue pre-training dense models on large data, maximizing the sparsity predictor and the LLM. This produces a strong starting point for subsequent fine-tuning. In stage two, training is performed only on supervised fine-tuning datasets, using highly varied task prompts and multiple examples. Masking is still dynamic due to the online generation of sparsity predictors pruning out unnecessary weights without affecting model performance. This eliminates the need for parameter reloading, a limitation observed in prior dynamic methods.

The performance of IFPruning was rigorously evaluated across multiple benchmarks. For instance, pruning a 9B parameter model to 3B improved coding task accuracy by 8% compared to a dense 3B model, closely rivaling the unpruned 9B model. On mathematical datasets like GSM8K and MATH, the dynamic pruning approach yielded a 5% increase in accuracy. It exhibited consistent gains on instruction-following evaluation in both IFEval and AlpacaEval for around 4-6 percent points. Even with multi-task benchmarks like MMLU, it showed promising robust results of IFPruning, displaying versatility across other domains.

These results underpin the IFPruning approach’s scalability since models with varying sizes, namely 6B, 9B, and 12B parameters, have been tested; in all, important performance improvements post-pruning are achieved. Scaling from a 6B dense model to a 12B dense model showed that, under the same condition, efficiency was improved along with task-specific accuracy. It further outperformed traditional structured pruning methods like Pruning + Distill due to the use of a dynamic sparsity mechanism.

The introduction of IFPruning marks a significant advancement in optimizing LLMs, providing a method that dynamically balances efficiency and performance. The approach addresses the limitations of static pruning and MoE architectures, setting a new standard for resource-efficient language models. With its ability to adapt to varied inputs without sacrificing accuracy, IFPruning presents a promising solution for deploying LLMs on resource-constrained devices.

This research will point out further developments in model pruning, which include optimizing other components, such as attention heads and hidden layers. Even though the methodology presented today tackles many of the computational challenges, further research in server-side applications and multi-task Pruning can broaden its scope of applicability. As a dynamic and efficient framework, IFPruning opens up possibilities for more adaptive and accessible large-scale language models.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation IntelligenceJoin this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

The post Apple Researchers Introduce Instruction-Following Pruning (IFPruning): A Dynamic AI Approach to Efficient and Scalable LLM Optimization appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

IFPruning LLM优化 动态剪枝 模型效率 人工智能
相关文章