MarkTechPost@AI 2024年12月19日
Google DeepMind Introduces ‘SALT’: A Machine Learning Approach to Efficiently Train High-Performing Large Language Models using SLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了谷歌DeepMind提出的新型大语言模型训练方法SALT。该方法创新地利用小语言模型(SLM)辅助训练大型语言模型(LLM),通过知识蒸馏将SLM的预测分布传递给LLM,并利用SLM选择对学习有价值的数据子集。SALT分为两个阶段,第一阶段SLM作为教师指导LLM训练,第二阶段LLM进行传统的自监督学习。实验结果表明,SALT方法在减少计算资源消耗的同时,显著提升了LLM的性能,尤其在阅读理解、常识推理和自然语言推理等任务上表现更佳。SALT的出现为资源有限的机构提供了一种高效训练大型模型的新途径。

👨‍🏫SALT方法创新性地利用小语言模型(SLM)作为辅助,在训练初期通过知识蒸馏的方式,将SLM的预测分布传递给大型语言模型(LLM),从而提高LLM的训练效率。

🎯在数据选择方面,SLM能够识别出那些既具有挑战性又易于学习的数据子集,使LLM在训练初期就能集中精力学习这些关键数据,避免了不必要的计算资源浪费。

⏱️实验结果表明,使用SALT方法训练的28亿参数LLM,在Pile数据集上表现优于传统方法,在阅读理解、常识推理和自然语言推理等基准测试中均有更佳表现,同时训练时间减少了约28%。

💡SALT方法还使得模型在预训练后,在少样本评估和下游任务中展示出更好的泛化能力,这进一步验证了该方法在提升模型性能方面的有效性。

Large Language Models (LLMs) are the backbone of numerous applications, such as conversational agents, automated content creation, and natural language understanding tasks. Their effectiveness lies in their ability to model and predict complex language patterns from vast datasets. However, developing LLMs presents a major challenge due to the immense computational cost of training. This involves optimizing models with billions of parameters over massive corpora, requiring extensive hardware and time. Consequently, there is a need for innovative training methodologies that can mitigate these challenges while maintaining or enhancing the quality of LLMs.

In developing LLMs, traditional training approaches are inefficient, as they treat all data equally, regardless of complexity. These methods do not prioritize specific subsets of data that could expedite learning, nor do they leverage existing models to assist in training. This often results in unnecessary computational effort, as simpler instances are processed alongside complex ones without differentiation. Also, standard self-supervised learning, where models predict the next token in a sequence, fails to utilize the potential of smaller, less computationally expensive models that can inform and guide the training of larger models.

Knowledge distillation (KD) is commonly employed to transfer knowledge from larger, well-trained models to smaller, more efficient ones. However, this process has rarely been reversed, where smaller models assist in training larger ones. This gap represents a missed opportunity, as smaller models, despite their limited capacity, can provide valuable insights into specific regions of the data distribution. They can efficiently identify “easy” and “hard” instances, which can significantly influence the training dynamics of LLMs.

Google Research and Google DeepMind researchers introduced a novel approach called Small model Aided Large model Training (SALT) to address the above challenges. This method innovatively employs smaller language models (SLMs) to improve the efficiency of LLM training. SALT leverages SLMs in two ways: providing soft labels as an additional source of supervision during the initial training phase and selecting subsets of data that are particularly valuable for learning. The approach ensures that LLMs are guided by SLMs in prioritizing informative and challenging data sequences, thereby reducing computational requirements while improving the overall quality of the trained model.

SALT operates through a two-phase methodology:

    In the first phase, SLMs act as teachers, transferring their predictive distributions to the LLMs via knowledge distillation. This process focuses on aligning the LLM’s predictions with those of the SLM in areas where the latter excels. Also, SLMs identify subsets of data that are both challenging and learnable, enabling the LLM to concentrate on these critical examples early in training. The second phase transitions to traditional self-supervised learning, allowing the LLM to independently refine its understanding of more complex data distributions. 

This two-stage process balances leveraging the strengths of SLMs and maximizing the inherent capabilities of LLMs.

In experimental results, a 2.8-billion-parameter LLM trained with SALT on the Pile dataset outperformed a baseline model trained using conventional methods. Notably, the SALT-trained model achieved better results on benchmarks such as reading comprehension, commonsense reasoning, and natural language inference while utilizing only 70% of the training steps. This translated to a reduction of approximately 28% in wall-clock training time. Also, the LLM pre-trained using SALT demonstrated a 58.99% accuracy in next-token prediction compared to 57.7% for the baseline and exhibited a lower log-perplexity of 1.868 versus 1.951 for the baseline, indicating enhanced model quality.

Key takeaways from the research include the following:

In conclusion, SALT effectively redefines the paradigm of LLM training by transforming smaller models into valuable training aids. Its innovative two-stage process achieves a rare balance of efficiency and effectiveness, making it a pioneering approach in machine learning. SALT will be instrumental in overcoming resource constraints, enhancing model performance, and democratizing access to cutting-edge AI technologies. This research underscores the importance of rethinking traditional practices and leveraging existing tools to achieve more with less.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Google DeepMind Introduces ‘SALT’: A Machine Learning Approach to Efficiently Train High-Performing Large Language Models using SLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SALT 大语言模型 知识蒸馏 小模型辅助训练 机器学习
相关文章