Pre-Training LLMs on a budget: A comparison of three optimizers

cs.AI updates on arXiv.org 07月14日 12:08

Pre-Training LLMs on a budget: A comparison of three optimizers

本文比较了三种LLM优化器（AdamW、Lion、Sophia）在不同基础架构和训练策略下的性能，发现Sophia训练和验证损失最低，Lion训练速度最快，而AdamW在下游评估结果中表现最佳。

arXiv:2507.08472v1 Announce Type: cross Abstract: Optimizers play a decisive role in reducing pre-training times for LLMs and achieving better-performing models. In this study, we compare three major variants: the de-facto standard AdamW, the simpler Lion, developed through an evolutionary search, and the second-order optimizer Sophia. For better generalization, we train with two different base architectures and use a single- and a multiple-epoch approach while keeping the number of tokens constant. Using the Maximal Update Parametrization and smaller proxy models, we tune relevant hyperparameters separately for each combination of base architecture and optimizer. We found that while the results from all three optimizers were in approximately the same range, Sophia exhibited the lowest training and validation loss, Lion was fastest in terms of training GPU hours but AdamW led to the best downstream evaluation results.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM优化器性能比较训练策略

相关文章

玩具 "计算机上的云性能：从 Python 到 Rust

按简单 DML 的 CPU 指令比较 SQL 引擎

OpenAI掀小模型血战！苹果DCLM强势登场，碾压Mistral 7B全开源

Dr. Stacy Sims: Female-Specific Exercise & Nutrition for Health, Performance & Longevity

ReliabilityBench: Measuring the Unpredictable Performance of Shaped-Up Large Language Models Across Five Key Domains of Human Cognition

LOONG: A New Autoregressive LLM-based Video Generator That can Generate Minute-Long Videos

无编码器多模态大模型EVE：原生多模态新方案

为什么 UUID 7 比 UUID 4 更适合作为 RDBMS 的聚集索引？

Meta斯坦福全新多模态Apollo，60分钟视频轻松理解！7B性能超越30B

VITA-1.5: 迈向GPT-4o级实时视频-语音交互