MarkTechPost@AI 2024年12月26日
Meet CoMERA: An Advanced Tensor Compression Framework Redefining AI Model Training with Speed and Precision
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

CoMERA是一种新型的AI模型训练框架,通过自适应张量压缩技术,有效提高了训练速度和内存效率。它采用多目标优化方法,平衡了压缩率和模型精度,并利用张量化嵌入和CUDA图优化GPU利用率。与传统方法相比,CoMERA在减少内存消耗的同时,显著加快了训练速度,并在各种模型和数据集上展示了强大的性能。该框架的创新之处在于其自适应张量表示,允许模型层根据资源约束动态调整其秩,从而在不牺牲性能的前提下实现压缩。

🚀 CoMERA通过自适应张量表示,实现了高达361倍的层压缩率和99倍的模型压缩率,大幅降低了存储和内存需求。

⏱️ 该框架在Transformer和推荐系统上实现了2-3倍的训练速度提升,有效节省了计算资源和时间。

💾 CoMERA通过张量化表示和CUDA图技术,将峰值内存消耗降低了7倍,使得在较小GPU上训练大型模型成为可能。

💡 CoMERA支持包括Transformer和大型语言模型在内的多种架构,同时保持甚至提高了模型精度。

🌱 通过降低训练的能源和资源需求,CoMERA有助于推动更可持续的AI实践,并使先进模型能够被更广泛的受众所使用。

Training large-scale AI models such as transformers and language models have become an indispensable yet highly demanding process in AI. With billions of parameters, these models offer groundbreaking capabilities but come at a steep cost in terms of computational power, memory, and energy consumption. For example, OpenAI’s GPT-3 comprises 175 billion parameters and requires weeks of GPU training. Such massive requirements limit these technologies to organizations with substantial computational resources, exacerbating concerns over energy efficiency and environmental impact. Addressing these challenges has become critical to ensuring the broader accessibility and sustainability of AI advancements.

The inefficiencies in training large models stem primarily from their reliance on dense matrices, which demand significant memory and computing power. The limited support for optimized low-precision or low-rank operations in modern GPUs further compounds these requirements. While some methods, such as matrix factorization and heuristic rank reduction, have been proposed to alleviate these issues, their real-world applicability is constrained. For instance, GaLore enables training on single-batch settings but suffers from impractical runtime overhead. Similarly, LTE, which adopts low-rank adapters, struggles with convergence on large-scale tasks. The lack of a method that simultaneously reduces memory usage, computational cost, and training time without compromising performance has created an urgent need for innovative solutions.

Researchers from the University at Albany SUNY, the University of California at Santa Barbara, Amazon Alexa AI, and Meta introduced Computing-and Memory-Efficient training method via Rank-Adaptive tensor optimization (CoMERA), a novel framework that combines memory efficiency with computational speed through rank-adaptive tensor compression. Unlike traditional methods focusing solely on compression, CoMERA adopts a multi-objective optimization approach to balance compression ratio and model accuracy. It utilizes tensorized embeddings and advanced tensor-network contractions to optimize GPU utilization, reducing runtime overhead while maintaining robust performance. The framework also introduces CUDA Graph to minimize kernel-launching delays during GPU operations, a significant bottleneck in traditional tensor compression approaches.

CoMERA’s foundation is based on adaptive tensor representations, which allow model layers to adjust their ranks dynamically based on resource constraints. By modifying tensor ranks, the framework achieves compression without compromising the integrity of neural network operations. This dynamic optimization is achieved through a two-stage training process: 

    An early stage focused on stable convergence A late stage that fine-tunes ranks to meet specific compression targets

In a six-encoder transformer model, CoMERA achieved compression ratios ranging from 43x in its early stage to an impressive 361x in its late-stage optimizations. Also, it reduced memory consumption by 9x compared to GaLore, with 2-3x faster training per epoch.

When applied to transformer models trained on the MNLI dataset, CoMERA reduced model sizes from 256 MB to as little as 3.2 MB while preserving accuracy. In large-scale recommendation systems like DLRM, CoMERA compressed models by 99x and achieved a 7x reduction in peak memory usage. The framework also excelled in pre-training CodeBERT, a domain-specific large language model, where it gained a 4.23x overall compression ratio and demonstrated a 2x speedup during certain training phases. These results underscore its ability to handle diverse tasks and architectures, extending its applicability across domains.

The key takeaways from this research are as follows:

In conclusion, CoMERA addresses some of the most significant barriers to AI scalability and accessibility by enabling faster, memory-efficient training. Its adaptive optimization capabilities and compatibility with modern hardware make it a compelling choice for organizations seeking to train large models without incurring prohibitive costs. This study’s results pave the way for further exploration of tensor-based optimizations in domains like distributed computing and resource-constrained edge devices.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Meet CoMERA: An Advanced Tensor Compression Framework Redefining AI Model Training with Speed and Precision appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CoMERA 张量压缩 AI模型训练 内存效率 GPU优化
相关文章