MarkTechPost@AI 2024年11月24日
Researchers from MBZUAI and CMU Introduce Bi-Mamba: A Scalable and Efficient 1-bit Mamba Architecture Designed for Large Language Models in Multiple Sizes (780M, 1.3B, and 2.7B Parameters)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Bi-Mamba是一种基于状态空间模型(SSM)的全新架构,旨在解决大型语言模型(LLM)在处理长序列时面临的计算成本和内存需求问题。它通过将Mamba模型的关键线性模块进行二值化,实现了高达80%的存储压缩,同时保持了与全精度模型相当的性能。Bi-Mamba在不同规模的模型(780M、1.3B和2.7B参数)上都取得了优异的成果,在困惑度和下游任务准确率方面均超越了现有方法,展示了其在资源受限环境下部署大型语言模型的潜力。

🤔 **Bi-Mamba架构通过选择性二值化线性模块,有效降低了模型存储需求,实现高达80%的存储压缩,例如,27亿参数的模型存储大小从5.03GB降低到0.55GB。** Bi-Mamba通过这种方式,显著减轻了模型部署的存储压力,使其更易于在资源受限的环境中应用。

🚀 **Bi-Mamba在保持与全精度模型相当的性能的同时,大幅降低了内存需求。** 在Wiki2、PTB和C4等数据集上,Bi-Mamba的困惑度得分显著优于GPTQ和Bi-LLM等其他方法,证明了其性能的稳定性和可靠性。

📊 **Bi-Mamba在不同规模的模型(780M、1.3B和2.7B参数)上均取得了良好的结果,展示了其良好的可扩展性。** 在下游任务(如BoolQ和HellaSwag)中,Bi-Mamba在零样本设置下也取得了不错的准确率,证明了其在不同任务和数据集上的鲁棒性。

💡 **Bi-Mamba采用二值化感知训练方法,避免了传统二值化方法带来的性能下降问题。** 通过结合可学习的缩放和偏移因子,Bi-Mamba确保了二值化参数与全精度参数的紧密匹配,从而有效地保持了模型的性能。

🌐 **Bi-Mamba的训练过程使用了来自RefinedWeb和StarCoder等来源的1.26万亿个token的大型数据集,并利用了像LLaMA2-7B这样的高精度教师模型来指导训练。** 这确保了Bi-Mamba模型的稳健性和泛化能力,使其能够更好地适应各种应用场景。

The evolution of machine learning has brought significant advancements in language models, which are foundational to tasks like text generation and question-answering. Among these, transformers and state-space models (SSMs) are pivotal, yet their efficiency when handling long sequences has posed challenges. As sequence length increases, traditional transformers suffer from quadratic complexity, leading to prohibitive memory and computational demands. To address these issues, researchers and organizations have explored alternative architectures, such as Mamba, a state-space model with linear complexity that provides scalability and efficiency for long-context tasks.

Large-scale language models often face challenges in managing computational costs, especially as they scale up to billions of parameters. For instance, while Mamba offers linear complexity advantages, its increasing size results in significant energy consumption and training costs, making deployment difficult. These limitations are exacerbated by the high resource demands of models like GPT-based architectures, which are traditionally trained and inferred at full precision (e.g., FP16 or BF16). Moreover, as demand grows for efficient, scalable AI, exploring extreme quantization methods has become critical to ensure practical deployment in resource-constrained settings.

Researchers have explored techniques such as pruning, low-bit quantization, and key-value cache optimizations to mitigate these challenges. Quantization, which reduces the bit-width of model weights, has shown promising results by compressing models without substantial performance degradation. However, most of these efforts focus on transformer-based models. The behavior of SSMs, particularly Mamba, under extreme quantization still needs to be explored, creating a gap in developing scalable and efficient state-space models for real-world applications.

Researchers from the Mohamed bin Zayed University of Artificial Intelligence and Carnegie Mellon University introduced Bi-Mamba, a 1-bit scalable Mamba architecture designed for low-memory, high-efficiency scenarios. This innovative approach applies binarization-aware training to Mamba’s state-space framework, enabling extreme quantization while maintaining competitive performance. Bi-Mamba was developed in model sizes of 780 million, 1.3 billion, and 2.7 billion parameters and trained from scratch using an autoregressive distillation loss. The model uses high-precision teacher models such as LLaMA2-7B to guide training, ensuring robust performance.

The architecture of Bi-Mamba employs selective binarization of its linear modules while retaining other components at full precision to balance efficiency and performance. Input and output projections are binarized using FBI-Linear modules, which integrate learnable scaling and shifting factors for optimal weight representation. This ensures that binarized parameters align closely with their full-precision counterparts. The model’s training utilized 32 NVIDIA A100 GPUs to process large datasets, including 1.26 trillion tokens from sources like RefinedWeb and StarCoder.

Extensive experiments demonstrated Bi-Mamba’s competitive edge over existing models. On datasets like Wiki2, PTB, and C4, Bi-Mamba achieved perplexity scores of 14.2, 34.4, and 15.0, significantly outperforming alternatives like GPTQ and Bi-LLM, which exhibited perplexities up to 10× higher. Also, Bi-Mamba achieved zero-shot accuracies of 44.5% for the 780M model, 49.3% for the 2.7B model, and 46.7% for the 1.3B variant on downstream tasks such as BoolQ and HellaSwag. This demonstrated its robustness across various tasks and datasets while maintaining energy-efficient performance.

The study’s findings highlight several key takeaways:

In conclusion, Bi-Mamba represents a significant step forward in addressing the dual challenges of scalability and efficiency in large language models. By leveraging binarization-aware training and focusing on key architectural optimizations, the researchers demonstrated that state-space models could achieve high performance under extreme quantization. This innovation enhances energy efficiency, reduces resource consumption, and sets the stage for future developments in low-bit AI systems, opening avenues for deploying large-scale models in practical, resource-limited environments. Bi-Mamba’s robust results underscore its potential as a transformative approach for more sustainable and efficient AI technologies.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers like Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face, and more.

The post Researchers from MBZUAI and CMU Introduce Bi-Mamba: A Scalable and Efficient 1-bit Mamba Architecture Designed for Large Language Models in Multiple Sizes (780M, 1.3B, and 2.7B Parameters) appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Bi-Mamba 大型语言模型 状态空间模型 二值化 低比特AI
相关文章