MarkTechPost@AI 2024年11月28日
The Allen Institute for AI (AI2) Releases OLMo 2: A New Family of Open-Sourced 7B and 13B Language Models Trained on up to 5T Tokens
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OLMo 2是由艾伦人工智能研究所开发的一系列开源语言模型,包含70亿和130亿参数的版本,在高达5万亿个token的数据集上进行训练。该模型通过改进训练稳定性、采用分阶段训练流程和多元数据集,缩小了与Llama 3.1等闭源模型的性能差距。OLMo 2在知识检索、推理和通用语言能力方面表现出色,为开源语言模型树立了新的标杆,体现了开源协作在人工智能领域的巨大潜力,并推动了更公平的技术发展。

🤔**训练稳定性改进:**通过RMSNorm和学习率退火等技术减少了预训练过程中的损失峰值,确保了模型性能的稳定性。OLMo 2在预训练过程中引入了RMSNorm和学习率退火等技术,有效地降低了训练过程中的损失波动,保证了模型训练的稳定性和可靠性。

🔄**创新的分阶段训练:**后期预训练干预,包括数据课程调整,使模型能力得到针对性增强。OLMo 2采用了分阶段训练策略,将预训练过程分为两个阶段,分别使用不同的数据集和训练方法,并在后期阶段引入数据课程调整等干预措施,从而更有针对性地提升模型的能力,例如知识检索、推理和通用语言能力。

📊**可操作的评估框架:**OLMES的引入为模型开发和进展跟踪提供了结构化基准。研究团队开发了OLMES(Open Language Modeling Evaluation System)评估框架,包含20项基准测试,涵盖了知识检索、推理、语言理解等多个方面,为模型的评估和开发提供了结构化和可操作的标准。

📚**数据集的多样性和质量:**在Dolmino-Mix-1124等数据集上进行预训练,确保模型能够在不同领域进行泛化。OLMo 2的训练使用了Dolmino-Mix-1124等包含网络和特定领域内容的数据集,确保了模型能够在各种不同的应用场景下进行泛化,并提升模型的通用语言能力。

The development of language modeling focuses on creating artificial intelligence systems that can process and generate text with human-like fluency. These models play critical roles in machine translation, content generation, and conversational AI applications. They rely on extensive datasets and complex training algorithms to learn linguistic patterns, enabling them to understand context, respond to queries, and create coherent text. The rapid evolution in this field highlights the growing importance of open-source contributions, which aim to democratize access to powerful AI systems.

A persistent issue in the field has been the dominance of proprietary models, which often outperform open-source systems due to their extensive resources and optimized training pipelines. Proprietary systems frequently leverage massive datasets, compute power, and advanced proprietary methodologies, creating a performance gap that open models need help to close. This disparity limits accessibility and innovation in AI, as only well-funded organizations can afford to develop such cutting-edge technology.

While commendable, current open-source methods still need to fully address the challenges of scalability, training stability, and model performance. Many models are either partially open, providing only limited datasets or methodologies, or fully open but need a competitive edge over their proprietary counterparts. However, recent advancements are paving the way for a new generation of fully open and competitive models in terms of performance.

The Allen Institute for AI research team introduced OLMo 2, a groundbreaking family of open-source language models. These models, available in 7 billion (7B) and 13 billion (13B) parameter configurations, were trained on up to 5 trillion tokens using state-of-the-art techniques. By refining training stability, adopting staged training processes, and incorporating diverse datasets, the researchers bridged the performance gap with proprietary systems like Llama 3.1. OLMo 2 leverages improvements in layer normalization, rotary positional embeddings, and Z-loss regularization to enhance model robustness.

OLMo 2’s training employed a curriculum approach across two stages. In the first stage, covering 90% of the pretraining budget, the models were trained on the OLMo-Mix-1124 dataset, comprising 3.9 trillion tokens sourced from various high-quality repositories like DCLM and Starcoder. The second stage involved fine-tuning Dolmino-Mix-1124, a curated dataset of 843 billion tokens featuring web-based and domain-specific content. Techniques like model souping, which merges checkpoints to optimize performance, were critical in achieving the final versions of the 7B and 13B models.

The performance of OLMo 2 sets new benchmarks in the field of open-source language modeling. Compared to its predecessor, OLMo-0424, OLMo 2 demonstrates a significant boost across all evaluation tasks. OLMo 2 7B notably outperforms Llama-3.1 8B, and OLMo 2 13B surpasses Qwen 2.5 7B, despite utilizing fewer training FLOPs. Evaluation using the Open Language Modeling Evaluation System (OLMES), a suite of 20 benchmarks, confirmed these gains, highlighting strengths in knowledge recall, reasoning, and general language capabilities.

Key takeaways from the research include the following advancements:

In conclusion, OLMo 2’s achievements signify a shift in the language modeling landscape. By addressing challenges such as training stability and evaluation transparency, the researchers have set a new standard for open-source AI. These models close the gap with proprietary systems and demonstrate the potential of collaborative innovation in advancing artificial intelligence. The OLMo 2 initiative underscores the transformative power of open access to high-performance AI models, paving the way for more equitable technological advancements.


Check out the Models on Hugging Face and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques’ Read the Full Report (Promoted)

The post The Allen Institute for AI (AI2) Releases OLMo 2: A New Family of Open-Sourced 7B and 13B Language Models Trained on up to 5T Tokens appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OLMo 2 开源语言模型 人工智能 自然语言处理
相关文章