MarkTechPost@AI 2024年07月07日
DeepSeek AI Researchers Propose Expert-Specialized Fine-Tuning, or ESFT to Reduce Memory by up to 90% and Time by up to 30%
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeek AI 和西北大学的研究人员提出了一种名为专家特化微调(ESFT)的新方法,该方法专门针对使用混合专家(MoE)架构的稀疏架构大型语言模型。ESFT 通过仅微调与特定任务最相关的专家来提高微调效率,同时冻结其他专家和模型组件,从而有效地减少了计算成本,同时保持了专家的专业化。

🤔 ESFT 是一种专门针对使用混合专家 (MoE) 架构的稀疏架构大型语言模型的微调方法,它通过仅微调与特定任务最相关的专家来提高微调效率,同时冻结其他专家和模型组件,从而有效地减少了计算成本,同时保持了专家的专业化。

🚀 ESFT 在各种下游任务中,不仅与传统的全参数微调方法的性能相匹配,而且往往超过了它们的性能。例如,在数学和代码等任务中,ESFT 实现了显著的性能提升,同时保持了高度的专业化。

💡 ESFT 方法利用了 MoE 架构固有的将不同任务分配给专家的能力,确保仅更新必要的参数。通过计算专家对特定任务数据的亲和力得分并选择与任务最相关的专家子集,ESFT 显著减少了与微调相关的计算成本。实验结果表明,与全参数微调相比,ESFT 可以将存储需求降低高达 90%,将训练时间降低高达 30%。

💪 ESFT 的有效性体现在其在各种任务中保持高性能的同时,显著提高了训练效率,减少了存储和训练时间。这使得 ESFT 成为定制大型语言模型未来发展的有前景的方法。

Natural language processing is advancing rapidly, focusing on optimizing large language models (LLMs) for specific tasks. These models, often containing billions of parameters, pose a significant challenge in customization. The aim is to develop efficient and better methods for fine-tuning these models to specific downstream tasks without prohibitive computational costs. This requires innovative approaches to parameter-efficient fine-tuning (PEFT) that maximize performance while minimizing resource usage.

One major problem in this domain is the resource-intensive nature of customizing LLMs for specific tasks. Traditional fine-tuning methods typically update all model parameters, which can lead to high computational costs and overfitting. Given the scale of modern LLMs, such as those with sparse architectures that distribute tasks across multiple specialized experts, there is a pressing need for more efficient fine-tuning techniques. The challenge lies in optimizing performance while ensuring the computational burden remains manageable.

Existing methods for PEFT in dense-architecture LLMs include low-rank adaptation (LoRA) and P-Tuning. These methods generally involve adding new parameters to the model or selectively updating existing ones. For instance, LoRA decomposes weight matrices into low-rank components, which helps reduce the number of parameters that need to be trained. However, these approaches have primarily focused on dense models and do not fully exploit the potential of sparse-architecture LLMs. In sparse models, different tasks activate different subsets of parameters, making traditional methods less effective.

DeepSeek AI and Northwestern University researchers have introduced a novel method called Expert-Specialized Fine-Tuning (ESFT) tailored for sparse-architecture LLMs, specifically those using a mixture-of-experts (MoE) architecture. This method aims to fine-tune only the most relevant experts for a given task while freezing the other experts and model components. By doing so, ESFT enhances tuning efficiency and maintains the specialization of the experts, which is crucial for optimal performance. The ESFT method capitalizes on the MoE architecture’s inherent ability to assign different tasks to experts, ensuring that only the necessary parameters are updated.

In more detail, ESFT involves calculating the affinity scores of experts to task-specific data and selecting a subset of experts with the highest relevance. These selected experts are then fine-tuned while the rest of the model remains unchanged. This selective approach significantly reduces the computational costs associated with fine-tuning. For instance, ESFT can reduce storage requirements by up to 90% and training time by up to 30% compared to full-parameter fine-tuning. This efficiency is achieved without compromising the model’s overall performance, as demonstrated by the experimental results.

In various downstream tasks, ESFT not only matched but often surpassed the performance of traditional full-parameter fine-tuning methods. For example, in tasks like math and code, ESFT achieved significant performance gains while maintaining a high degree of specialization. The method’s ability to efficiently fine-tune a subset of experts, selected based on their relevance to the task, highlights its effectiveness. The results showed that ESFT maintained general task performance better than other PEFT methods like LoRA, making it a versatile and powerful tool for LLM customization.

In conclusion, the research introduces Expert-Specialized Fine-Tuning (ESFT) as a solution to the problem of resource-intensive fine-tuning in large language models. By selectively tuning relevant experts, ESFT optimizes both performance and efficiency. This method leverages the specialized architecture of sparse-architecture LLMs to achieve superior results with reduced computational costs. The research demonstrates that ESFT can significantly improve training efficiency, reduce storage and training time, and maintain high performance across various tasks. This makes ESFT a promising approach for future developments in customizing large language models. 


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post DeepSeek AI Researchers Propose Expert-Specialized Fine-Tuning, or ESFT to Reduce Memory by up to 90% and Time by up to 30% appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ESFT 大模型 稀疏架构 混合专家 微调
相关文章