MarkTechPost@AI 2024年07月14日
Q-GaLore Released: A Memory-Efficient Training Approach for Pre-Training and Fine-Tuning Machine Learning Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Q-GaLore 是一种全新的训练大型语言模型 (LLM) 的方法,它通过结合量化和低秩投影来显著降低内存消耗,使训练过程更加高效,并使 LLM 训练更易于访问。Q-GaLore 在预训练和微调场景中都表现出色,能够在单个 NVIDIA RTX 4060 Ti 上训练一个 7B 的 LLaMA 模型,同时在 MMLU 基准测试中比其他方法(如 LoRA 和 GaLore)表现更好。

🤔 Q-GaLore 是一种创新的训练大型语言模型 (LLM) 的方法,它通过结合量化和低秩投影来大幅降低内存消耗。它利用了梯度子空间的不同属性,并采用自适应更新策略,在保持性能的同时减少了 SVD 操作的数量。

🚀 Q-GaLore 在预训练和微调场景中都表现出色。它能够在单个 NVIDIA RTX 4060 Ti 上训练一个 7B 的 LLaMA 模型,这证明了它在内存效率和实用性方面的优越性。在微调任务中,Q-GaLore 与其他方法相比,内存消耗降低了 50%,同时在 MMLU 基准测试中表现更好。

💡 Q-GaLore 的性能和效率在各种模型尺寸上都得到了验证,从 6000 万到 70 亿个参数。对于 10 亿个参数的模型,Q-GaLore 实现了与原始 GaLore 方法相当的预训练性能,同时内存节省了 29.68%。值得注意的是,Q-GaLore 在 16GB 内存限制下成功地预训练了一个 7B 模型,其困惑度与基线模型相比仅略有差异。

📈 Q-GaLore 的成功之处在于它能够在有限的硬件资源下训练高性能的 LLM 模型。这种方法将使 LLM 技术更加容易获得,并为更多用户和应用提供更广泛的访问权限。

Large Language Models (LLMs) have become critical tools in various domains due to their exceptional ability to understand and generate human language. These models, which often contain billions of parameters, require extensive computational resources for training and fine-tuning. The primary challenge lies in efficiently managing the memory and computational demands to make these models accessible to various users & applications.

Training LLMs are inherently memory-intensive, necessitating substantial hardware resources that are only readily available to some users. Traditional methods demand large memory allocations to handle the numerous parameters and optimization states. For instance, training a LLaMA 7B model from scratch typically requires around 58 GB of memory, including 14 GB for trainable parameters, 42 GB for Adam optimizer states and weight gradients, and 2 GB for activation. This high memory requirement poses a significant barrier to entry for many researchers and developers who need access to advanced hardware setups.

Various techniques have been developed to address this problem. These include designing smaller-scale LLMs, employing efficient scaling techniques, and incorporating sparsity into the training methodologies. Among these, GaLore has emerged as a notable method, allowing for the full-parameter training of LLMs through low-rank gradient updates using Singular Value Decomposition (SVD). GaLore reduces memory usage by up to 63.3%, enabling training a 7B model with just 24GB of memory. However, GaLore still requires more memory than is available on many commonly used devices, such as popular laptop GPUs like the RTX 4060 Ti, which have up to 16GB of memory.

Researchers from the University of Texas at Austin, the University of Surrey, the University of Oxford, the California Institute of Technology, and Meta AI have introduced Q-GaLore to reduce memory consumption further and make LLM training more accessible. Q-GaLore combines quantization and low-rank projection to enhance memory efficiency significantly. This method builds on two key observations: the gradient subspace exhibits diverse properties, with some layers stabilizing early in training. In contrast, others change frequently, and the projection matrices are highly resilient to low-bit quantization. By leveraging these insights, Q-GaLore adaptively updates the gradient subspace based on convergence statistics, maintaining performance while reducing the number of SVD operations. The model weights are kept in INT8 format, and the projection matrices are in INT4 format, which conserves memory aggressively.

Q-GaLore employs two main modules: low-precision training with low-rank gradients and lazy layer-wise subspace exploration. The entire model, including optimizer states, uses 8-bit precision for the Adam optimizer, and the projection matrices are quantized to 4 bits. This approach results in a memory reduction of approximately 28.57% for gradient low-rank training. Stochastic rounding maintains training stability and approximates the high-precision training trajectory. This method allows for a high-precision training path using only low-precision weights, preserving small gradient contributions effectively without needing to maintain high-precision parameters.

In practical applications, Q-GaLore has performed exceptionally in pre-training and fine-tuning scenarios. During pre-training, Q-GaLore enabled the training of an LLaMA-7B model from scratch on a single NVIDIA RTX 4060 Ti with only 16GB of memory. This is a significant achievement, demonstrating the method’s exceptional memory efficiency and practicality. In fine-tuning tasks, Q-GaLore reduced memory consumption by up to 50% compared to other methods like LoRA and GaLore while consistently outperforming QLoRA by up to 5.19 on MMLU benchmarks at the same memory cost.

Q-GaLore’s performance and efficiency were evaluated across various model sizes, from 60 million to 7 billion parameters. For a 1 billion parameter model, Q-GaLore maintained comparable pre-training performance with less than a 0.84 increase in perplexity compared to the original GaLore method while achieving a 29.68% memory saving against GaLore and a 60.51% memory saving compared to the full baseline. Notably, Q-GaLore facilitated the pre-training of a 7B model within a 16GB memory constraint, achieving a perplexity difference of less than one compared to the baseline models.

In conclusion, Q-GaLore offers a practical solution to the memory constraints traditionally associated with these models in the efficient training of LLMs. By combining quantization and low-rank projection, Q-GaLore achieves competitive performance and broadens the accessibility of powerful language models. This method highlights the potential for optimizing large-scale models for more commonly available hardware configurations, making cutting-edge language processing technologies more accessible to a wider audience.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Q-GaLore Released: A Memory-Efficient Training Approach for Pre-Training and Fine-Tuning Machine Learning Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Q-GaLore 大型语言模型 LLM 内存效率 训练 量化 低秩投影
相关文章