MarkTechPost@AI 2024年10月20日
Self-Data Distilled Fine-Tuning: A Solution for Pruning and Supervised Fine-tuning Challenges in LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大型语言模型(LLM)如GPT-4、Gemini和Llama 3通过广泛的预训练和监督微调(SFT)彻底改变了自然语言处理。然而,这些模型的训练和推理成本很高。结构化剪枝已成为一种很有前途的方法,通过选择性地删除不太重要的组件来提高LLM效率。尽管有潜力,但深度结构化剪枝面临着诸如精度下降的挑战,尤其是在需要多步推理的任务中。剪枝可能会破坏层之间信息流,即使在SFT之后也会导致模型质量下降。此外,微调可能会增加灾难性遗忘,从而进一步降低模型质量。因此,开发有效的策略来减轻剪枝过程中的这些挑战至关重要。

🤖 自数据蒸馏微调:利用原始未剪枝模型生成蒸馏数据集,以保留语义丰富度并通过保持与基础模型知识的一致性来减轻灾难性遗忘。这种方法在HuggingFace OpenLLM Leaderboard v1上比标准SFT显示出显著改进,平均精度提高了8%。这种方法可以有效地扩展到不同的数据集,质量改进与数据集大小呈正相关。

📈 评估层重要性指标、剪枝块大小和微调策略:比较块重要性(BI)和角余弦指标以确定层冗余,在不同的块大小上发现可比的结果。该方法使用LoRA微调对标准数据集和自蒸馏数据集进行微调,重点关注推理密集型任务。使用LM-eval-harness在ARC-C、GSM8k和MMLU任务上评估模型。

🔍 减少灾难性遗忘:研究人员比较了在监督数据集和自数据蒸馏数据集上微调的模型的句子嵌入。与SFT相比,自数据微调模型保留了原始模型的学习表示。

📊 Llama3.1-8B Instruct模型在各种块大小上进行剪枝,使用三种微调策略进行评估:不进行微调、SFT和自数据蒸馏。未进行微调的剪枝模型显示出显着的精度损失,突出了剪枝后自适应的必要性。虽然SFT提高了质量,在块大小为6时平均恢复率达到81.66%,但它在推理密集型任务中遇到了困难。自数据蒸馏显着提高了质量恢复率,在块大小为6时达到91.24%,在GSM8k精度方面取得了显著改进。此外,自数据蒸馏通过称为球面线性插值(SLERP)的模型合并得到了改进。在块大小为6时,合并后的模型实现了93.30%的恢复率,超过了仅OpenMathInstruct模型的91.24%的恢复率。

🚀 未来研究包括将这种技术与补充压缩方法相结合,采用利用动态生成数据集或多模态输入的微调策略,以及将这些方法扩展到下一代LLM架构。

Large language models (LLMs) like GPT-4, Gemini, and Llama 3 have revolutionized natural language processing through extensive pre-training and supervised fine-tuning (SFT). However, these models come with high computational costs for training and inference. Structured pruning has emerged as a promising method to improve LLM efficiency by selectively removing less critical components. Despite its potential, depth-wise structured pruning faces challenges like accuracy degradation, especially in tasks that require multi-step reasoning. Pruning can disrupt information flow between layers, leading to poor model quality even after SFT. Also, fine-tuning can increase catastrophic forgetting, further degrading model quality. So, developing effective strategies to mitigate these challenges during pruning is crucial.

Existing attempts to address LLM efficiency challenges include pruning for model compression, distillation, and methods to mitigate catastrophic forgetting. Pruning aims to reduce model complexity but can lead to inefficient acceleration or degraded model quality. Knowledge Distillation (KD) allows smaller models to learn from larger ones, with recent applications in pre-training and fine-tuning. However, these techniques often result in catastrophic forgetting, where models lose previously learned capabilities. In catastrophic forgetting, regularization techniques like Elastic Weight Consolidation and architecture-based methods have been used to solve this issue, but they also have limitations. However, challenges persist in maintaining model quality while improving efficiency, especially for complex reasoning tasks.

A team from Cerebras Systems has proposed self-data distilled fine-tuning, a method to address the challenges associated with pruning and SFT in large language models. This approach utilizes the original, unpruned model to generate a distilled dataset that preserves semantic richness and mitigates catastrophic forgetting by maintaining alignment with the base model’s knowledge. This method shows significant improvement over standard SFT, with an increase in average accuracy by up to 8% on the HuggingFace OpenLLM Leaderboard v1. This approach scales effectively across datasets, with quality improvements correlating positively with dataset size.

The methodology involves evaluating layer importance metrics, pruning block sizes, and fine-tuning strategies. Block Importance (BI) and angular cosine metrics are compared to determine layer redundancy, finding comparable results across block sizes. The proposed method uses LoRA fine-tuning on standard and self-distilled datasets, focusing on reasoning-heavy tasks. Models are evaluated on ARC-C, GSM8k, and MMLU tasks using LM-eval-harness. To reduce catastrophic forgetting, the researchers compared sentence embeddings of models fine-tuned on supervised and self-data distilled datasets. The self-data fine-tuned model preserves the original model’s learned representations compared to SFT.

The Llama3.1-8B Instruct models pruned at various block sizes are evaluated using three fine-tuning strategies: no fine-tuning, SFT, and self-data distillation. Pruned models without fine-tuning show a substantial loss in accuracy, highlighting the need for post-pruning adaptation. While SFT improved quality, achieving an average recovery of 81.66% at block size 6, it struggled with reasoning-heavy tasks. Self-data distillation significantly enhanced quality recovery, reaching 91.24% at block size 6, with great improvements in GSM8k accuracy. Moreover, the self-data distillation is improved using model merging called Spherical Linear Interpolation (SLERP). At block size 6, the merged model achieved a 93.30% recovery, outperforming the 91.24% recovery of the OpenMathInstruct model alone.

In conclusion, the team introduced self-data distilled fine-tuning, an effective method to counteract quality degradation in pruned Llama3.1-8B Instruct models. This approach outperforms standard SFT, showing superior accuracy recovery post-pruning across various tasks on the HuggingFace OpenLLM Leaderboard v1. The findings in this paper establish self-data distilled fine-tuning as a critical tool for maintaining high model quality post-pruning, providing an efficient solution for large-scale model compression. Future research includes integrating this technique with complementary compression methods, adopting fine-tuning strategies that leverage dynamically generated datasets or multi-modal inputs, and extending these methodologies to next-generation LLM architectures.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Self-Data Distilled Fine-Tuning: A Solution for Pruning and Supervised Fine-tuning Challenges in LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 剪枝 微调 自数据蒸馏 模型压缩
相关文章