MarkTechPost@AI 2024年12月28日
YuLan-Mini: A 2.42B Parameter Open Data-efficient Language Model with Long-Context Capabilities and Advanced Training Techniques
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

YuLan-Mini是一款24.2亿参数的高效语言模型,由中国人民大学高瓴人工智能学院的研究人员开发。它通过数据高效的方法,在有限的计算资源下实现了与大型工业模型相媲美的性能。该模型利用公开数据,并采用创新的架构设计,包括嵌入绑定、旋转位置编码(ROPE)和SwiGLU激活函数等,有效提升了训练效率和模型性能。YuLan-Mini在长文本处理方面表现突出,其上下文长度扩展至28672个token,并在各种基准测试中取得了优异成绩,证明了其在资源受限情况下实现高性能的潜力。

🚀 YuLan-Mini模型仅有24.2亿参数,通过数据高效的训练方法,在有限的计算资源下,实现了与大型工业模型相当的性能,突显了其高效性。

🔗 该模型采用了多项创新技术,包括嵌入绑定以减少参数量,旋转位置编码(ROPE)以有效处理长文本,以及SwiGLU激活函数以提升数据表示能力。这些技术的综合应用增强了模型的训练稳定性和性能。

⏱️ YuLan-Mini的上下文长度扩展至28672个token,使其在长文本处理方面具有显著优势,同时在短文本任务中也保持了高精度。这一双重能力使其在众多模型中脱颖而出。

📊 YuLan-Mini在HumanEval、MATH-500和MMLU等基准测试中取得了优异成绩,证明了其在零样本、四样本和五样本场景下的竞争力,展示了其强大的泛化能力。

💡 研究强调了精心设计的数据管道、系统优化和退火策略的重要性,这些方法有助于减少对大规模数据集的依赖,并防止训练过程中出现梯度爆炸等问题,为资源受限的AI研究提供了宝贵的经验。

Large language models (LLMs) built using transformer architectures heavily depend on pre-training with large-scale data to predict sequential tokens. This complex and resource-intensive process requires enormous computational infrastructure and well-constructed data pipelines. The growing demand for efficient and accessible LLMs has led researchers to explore techniques that balance resource use and performance, emphasizing achieving competitive results without relying on industry-scale resources.

Developing LLMs is filled with challenges, especially regarding computation and data efficiency. Pre-training models with billions of parameters demand advanced techniques and substantial infrastructure. High-quality data and robust training methods are crucial, as models face gradient instability and performance degradation during training. Open-source LLMs often struggle to match proprietary counterparts because of limited access to computational power and high-caliber datasets. Therefore, the challenge lies in creating efficient and high-performing models, enabling smaller research groups to participate actively in advancing AI technology. Solving this problem necessitates innovation in data handling, training stabilization, and architectural design.

Existing research in LLM training emphasizes structured data pipelines, using techniques like data cleaning, dynamic scheduling, and curriculum learning to improve learning outcomes. However, stability remains a persistent issue. Large-scale training is susceptible to gradient explosions, loss spikes, and other technical difficulties, requiring careful optimization. Training long-context models introduce additional complexity as attention mechanisms’ computational demands grow quadratically with sequence length. Existing approaches like advanced optimizers, initialization strategies, and synthetic data generation help alleviate these issues but often fall short when scaled to full-sized models. The need for scalable, stable, and efficient methods in LLM training is more urgent than ever.

Researchers at the Gaoling School of Artificial Intelligence, Renmin University of China, developed YuLan-Mini. With 2.42 billion parameters, this language model improves computational efficiency and performance with data-efficient methods. By leveraging publicly available data and focusing on data-efficient training techniques, YuLan-Mini achieves remarkable performance comparable to larger industry models.

YuLan-Mini’s architecture incorporates several innovative elements to enhance training efficiency. Its decoder-only transformer design employs embedding tying to reduce parameter size and improve training stability. The model uses Rotary Positional Embedding (ROPE) to handle long contexts effectively, extending its context length to 28,672 tokens, an advancement over typical models. Other key features include SwiGLU activation functions for better data representation and a carefully designed annealing strategy that stabilizes training while maximizing learning efficiency. Synthetic data was critical, supplementing the 1.08 trillion tokens of training data sourced from open web pages, code repositories, and mathematical datasets. These features enable YuLan-Mini to deliver robust performance with a limited computing budget.

YuLan-Mini’s performance achieved scores of 64.00 on HumanEval in zero-shot scenarios, 37.80 on MATH-500 in four-shot settings, and 49.10 on MMLU in five-shot tasks. These results underscore its competitive edge, as the model’s performance is comparable to much larger and resource-intensive counterparts. The innovative context length extension to 28K tokens allowed YuLan-Mini to excel in long-text scenarios while still maintaining high accuracy in short-text tasks. This dual capability sets it apart from many existing models, which often sacrifice one for the other.

Key takeaways from the research include:

In conclusion, YuLan-Mini is a great new addition to evolving efficient LLMs. Its ability to deliver high performance with limited resources addresses critical barriers to AI accessibility. The research team’s focus on innovative techniques, from data efficiency to training stability, highlights the potential for smaller-scale research to contribute to the field significantly. With just 1.08T tokens, YuLan-Mini sets a benchmark for resource-efficient LLMs.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post YuLan-Mini: A 2.42B Parameter Open Data-efficient Language Model with Long-Context Capabilities and Advanced Training Techniques appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

YuLan-Mini 大语言模型 数据高效 长文本处理 AI效率
相关文章