MarkTechPost@AI 2024年10月18日
Model Kinship: The Degree of Similarity or Relatedness between LLMs, Analogous to Biological Evolution
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近年来,大型语言模型(LLM)取得了显著进展,微调预训练模型以执行特定任务已成为一种普遍做法。然而,这种方法在部署独立模型以执行每个任务时存在资源效率问题。对多任务学习解决方案的不断增长的需求促使研究人员探索模型合并作为一种可行的替代方案。这种技术集成了多个专家模型以实现多任务目标,为 LLM 的发展提供了有希望的途径。尽管模型合并工具包取得了进展,并且通过迭代合并开发了更强大的 LLM,但该过程在很大程度上依赖于试错和人类专业知识。随着合并迭代的进展,实现进一步的泛化增益变得越来越具有挑战性,突出了对驱动这些进展的潜在机制的更深入理解的必要性。

👨‍🔬 研究人员引入了“模型血缘”的概念,该概念借鉴了进化生物学的灵感,用于评估迭代模型合并期间 LLM 之间的相关性。该指标提供了宝贵的见解,可以增强合并策略,并能够从多个角度对合并过程进行全面分析。研究表明,多任务能力的提高与模型血缘之间存在很强的相关性,这可以指导选择用于合并的候选模型。

📈 研究人员确定了模型合并过程中的两个不同阶段:学习阶段(性能显着提高)和饱和阶段(性能改进减弱)。这种观察表明存在优化挑战,例如局部最优陷阱。为了缓解这些问题,研究人员提出了一种名为“Top-k 贪婪合并与模型血缘”的稳健策略。

🚀 该论文的主要贡献包括:引入模型血缘作为评估 LLM 相关性的工具,通过迭代合并对模型演化的全面实证分析,以及利用模型血缘的实用模型合并策略。这些进步旨在提高模型演化的效率和有效性,有可能彻底改变 LLM 领域中的自动合并研究。

📊 研究人员使用 SLERP(球面线性插值)和 Mergekit 工具包进行了两次迭代模型合并实验。他们比较了两种策略:Top-k 贪婪合并和 Top-k 贪婪合并与模型血缘。后者基于模型血缘引入了额外的探索步骤,以发现潜在的更好的解决方案。

🧪 结果表明,两种策略都实现了多任务目标,但普通贪婪策略在第 2 代之后停止改进,平均任务性能稳定在 68.72。相比之下,基于血缘的方法继续改进,到第 5 代达到 69.13,有效地逃脱了局部最优。

💡 对权重变化的分析表明,合并具有低血缘的模型将独特的变化引入权重空间,有助于逃脱局部最优。这在与探索模型合并时观察到的显着权重变化中很明显。

⏱️ 研究人员还发现,模型血缘可以作为有效的提前停止标准。当顶级模型之间的模型血缘超过 0.9 时,表明收敛。将此作为停止条件实施,在性能略微或没有降低的情况下,将时间效率提高了约 30%。

Large Language Models (LLMs) have gained significant traction in recent years, with fine-tuning pre-trained models for specific tasks becoming a common practice. However, this approach needs help in resource efficiency when deploying separate models for each task. The growing demand for multitask learning solutions has led researchers to explore model merging as a viable alternative. This technique integrates multiple expert models to achieve multitask objectives, offering a promising path for LLM evolution. Despite advancements in model merging toolkits and the development of more powerful LLMs through iterative merging, the process largely relies on trial and error and human expertise. As merging iterations progress, achieving further generalization gains becomes increasingly challenging, highlighting the need for a deeper understanding of the underlying mechanisms driving these advancements.

Researchers have explored various approaches to address the challenges of model merging and multitask learning in LLMs. Weight averaging, originating from Utans’ work in 1996, has been widely applied in deep neural networks for combining checkpoints, utilizing task-specific information, and parallel training of LLMs. The discovery of Linear Mode Connectivity (LMC) expanded the use of weight averaging in fusing fine-tuned models. Further studies have explored optimizable weights for merging, such as FisherMerging, RegMean, AdaMerging, and MaTS.

Task vectors and parameter interference reduction techniques like TIES and DARE have been developed to enhance multitask learning and prevent conflicts during merging. Model Breadcrumbs demonstrated the benefits of removing outlier parameters to reduce noise. For merging models with different initializations, methods exploiting neural network permutation symmetry and alignment techniques have been proposed.

Recent work has focused on “model evolution,” with approaches like CoLD Fusion for iterative fusion, automated merging tools on platforms like Hugging Face, and Evolutionary Model Merge employing evolutionary techniques to optimize model combinations. These advancements aim to uncover hidden patterns in the merging process that human intuition alone might miss.

Researchers from Zhejiang University and the National University of Singapore, NUS-NCS Joint Lab, introduce model kinship, drawing inspiration from evolutionary biology, to estimate the relatedness between LLMs during iterative model merging. This metric offers valuable insights to enhance merging strategies and enables comprehensive analysis of the merging process from multiple perspectives. The study reveals a strong correlation between multitask capability improvements and model kinship, which can guide the selection of candidate models for merging.

The research identifies two distinct stages in the model merging process: a learning stage with significant performance improvements and a saturation stage where improvements diminish. This observation suggests the presence of optimization challenges, such as local optima traps. To mitigate these issues, the researchers propose a robust strategy called Top-k Greedy Merging with Model Kinship.

The paper’s key contributions include the introduction of model kinship as a tool for assessing LLM relatedness, a comprehensive empirical analysis of model evolution through iterative merging, and practical model merging strategies utilizing model kinship. These advancements aim to improve the efficiency and effectiveness of model evolution, potentially revolutionizing auto-merging research in the field of LLMs.

The researchers conducted two iterative model merging experiments using SLERP (Spherical Linear Interpolation) and the Mergekit toolkit. They compared two strategies: Top-k Greedy Merging and Top-k Greedy Merging with Model Kinship. The latter introduced an additional exploration step based on model kinship to discover potentially better solutions.

Results showed that both strategies achieved multi-task goals, but the vanilla greedy strategy stopped improving after Generation 2, stabilizing at an average task performance of 68.72. In contrast, the kinship-based method continued to improve, reaching 69.13 by Generation 5, effectively escaping local optima.

Analysis of weight changes revealed that merging models with low kinship introduced unique variations into the weight space, helping to escape local optima. This was evident in the significant weight changes observed when merging with exploration models.

The researchers also found that model kinship could serve as an effective early-stopping criterion. When model kinship between top-performing models exceeded 0.9, it indicated convergence. Implementing this as a stopping condition improved time efficiency by approximately 30% with minimal or no performance reduction.

This research introduces model kinship to guide the merging of Large Language Models, providing insights into the model evolution process. The proposed Top-k Greedy Merging with Model Kinship strategy demonstrates effectiveness in escaping local optima traps and enabling further improvements. Model kinship also serves as an early stopping criterion, reducing computational waste. Drawing parallels to biological hybridization, this work explores autonomous model evolution through merging. As language models continue to advance, these findings offer valuable insights into optimizing their development and performance, paving the way for more efficient and effective LLM evolution strategies.


Check out the Papers. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Model Kinship: The Degree of Similarity or Relatedness between LLMs, Analogous to Biological Evolution appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

模型血缘 大型语言模型 LLM 模型合并 多任务学习
相关文章