MarkTechPost@AI 2024年12月14日
TIME Framework: A Novel Machine Learning Unifying Framework Breaking Down Temporal Model Merging
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了图宾大学提出的“TIME”框架,该框架旨在解决时序模型融合问题,特别是在不断出现新任务和信息的背景下。TIME框架围绕专家模型的初始化、部署时的融合以及随时间应用的融合技术这三个主要轴展开,对现有技术进行了系统评估。研究表明,融合策略本身的影响较小,而最佳初始化和部署策略的选择至关重要。通过“BEST-IN-TIME”策略,时序模型融合在模型大小和任务数量上展现出高效的可扩展性,且计算资源的增加进一步提高了其有效性。此外,数据回放和非均匀加权等方法能有效提升离线融合的性能,使其接近持续训练的水平。

⚙️ TIME框架围绕时序模型融合的三个主要轴展开:专家模型的初始化、部署时的融合以及随时间应用的融合技术。该框架旨在系统地评估现有技术,并为时序模型融合提供指导。

💡 研究发现,融合策略本身对结果的影响较小,而最佳初始化和部署策略的选择至关重要。这意味着在时序模型融合中,如何开始(初始化)和如何应用(部署)模型比具体的融合方法更关键。

📈 通过“BEST-IN-TIME”策略,时序模型融合在模型大小和任务数量上展现出高效的可扩展性。这意味着该方法可以有效地应用于不同规模的模型和任务,并且随着计算资源的增加,效果会进一步提升。

💾 数据回放和非均匀加权等方法能够显著提升离线融合的性能,使其接近持续训练的水平。这些方法通过引入历史数据或对近期任务赋予更高的权重来弥补离线融合的不足。

Model Merging allows one to leverage the expertise of specific fine-tuned models as a single powerful entity. The concept is straightforward: teach variants of a base foundation model on independent tasks until they become experts, and then assemble these experts as one. However, new concepts, domains, and tasks are emerging at an ever-increasing rate, leaving the possibility of them being insufficiently covered during pre-training—after all, there is only so much a model can learn at once! Temporal Model Merging addresses this by integrating the knowledge of expert models as they become available.

A multitude of questions arise when considering temporal model merging, such as whether it is affected by the choice of training initialization, what the best techniques over time are, and whether it is beneficial to change strategies between training and deployment. This article discusses the latest research that attempts to answer these questions and explores various aspects of model merging over time.

Researchers from the University of Tübingen introduced “TIME” (Temporal Integration of Model Expertise) in their latest paper, ‘How to Merge Your Multimodal Models Over Time? ‘. TIME is a unified framework structured around three major axes of temporal model merging: initialization of experts, merging for deployment at a given time point, and merging techniques applied over time. TIME systematically assesses existing techniques by studying them along each axis. It considers both standard model merging and continual pretraining, making it a generic framework.

The authors define a five-stage update pipeline for every task to incorporate all three axes of temporal model merging. These steps are:

1) Init: The user chooses an initialization protocol to produce initialization weights at time t.

2) Train: With the weights obtained from step 1, the user trains the model on a given task to produce the expert.

3)Store-: The trained weights are appended to the storage of model expert weights.

 4)Deploy: The user chooses a deployment protocol to produce the output weights.

5) Eval: Deployed model is used for downstream applications and evaluation.

To study temporal model merging in continual pretraining, the authors used the FOMO-in-Flux benchmark, which includes several adaptation and evaluation datasets covering a range of visual and semantic distribution shifts. The foundational model chosen for fine-tuning was ViT-B/16 CLIP, and the evaluation metrics were Knowledge Accumulation (how well the model learns on new tasks) and Zero-Shot Retention (how much of the initial model’s zero-shot capabilities are preserved)

The researchers first studied the static offline merging approach, where the temporal aspect is ignored, and found marginal differences between various strategies. Offline merging with every technique produced similar results but struggled with knowledge acquisition. Consequently, continual training performed better. The paper further discusses potential measures to bridge the gap between offline and continual merging. One proposed solution is applying data replay on top of standard offline merging, which significantly boosted performance from 54.6% to 58.2%. The authors also explored offline temporal ordering via non-uniform weighting, assigning higher weights to recent tasks to account for temporal events. This increased performance to 58.9%, very close to the replay baseline of 59.1%.

The experiments confirmed that the specific merging technique used is much less important than selecting the best initialization and deployment strategies. Thus, the authors developed a “BEST-IN-TIME” initialization and deployment strategy, which they used to study the scalability of temporal models merging across model size, compute budgets, and the number of tasks. This analysis revealed that the temporal model merging with BEST-IN-TIME scales efficiently across model sizes and tasks, with compute scaling further improving its effectiveness.

Conclusion: TIME addresses temporal multimodal model merging, especially in the context of continuously emerging tasks and information, through a systematic study across three axes. The analysis provided important insights into the roles of initialization, deployment, and merging strategies, with merging strategies having minimal impact on the overall results. The paper also emphasized the significance of temporal merging itself, as evidenced by the underperformance of offline merging relative to continual training baselines.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post TIME Framework: A Novel Machine Learning Unifying Framework Breaking Down Temporal Model Merging appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

模型融合 时序模型 TIME框架 持续学习 机器学习
相关文章