MarkTechPost@AI 04月12日 01:15
LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Yandex Research、MIT、KAUST和ISTA的团队合作开发了HIGGS方法,这是一种创新的大语言模型(LLM)压缩技术。HIGGS能够在没有额外数据或资源密集型参数优化的情况下压缩LLMs。与传统方法不同,HIGGS无需特殊硬件和强大的GPU,可以在智能手机或笔记本电脑上几分钟内完成量化,且质量损失极小。该方法已成功应用于LLaMA 3.1、3.2系列以及DeepSeek和Qwen系列模型。HIGGS降低了在消费级设备上测试和部署新模型的门槛,使得LLMs更容易被广泛应用。

💡HIGGS是一种创新的大语言模型压缩方法,由Yandex Research等机构合作开发,能够在无需额外数据或参数优化的情况下压缩LLMs。

📱HIGGS的主要优势在于,它不需要昂贵的硬件设备,例如强大的GPU,就可以在智能手机或笔记本电脑上快速完成模型量化,且质量损失极小。

🚀HIGGS已经成功应用于多种流行的LLM,包括LLaMA 3.1和3.2系列,以及DeepSeek和Qwen系列,这表明了该方法在实际应用中的可行性和广泛适用性。

The Yandex Research team, together with researchers from the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA) and the King Abdullah University of Science and Technology (KAUST), developed a method to rapidly compress large language models without a significant loss of quality. 

Previously, deploying large language models on mobile devices or laptops involved a quantization process — taking anywhere from hours to weeks and it had to be run on industrial servers — to maintain good quality. Now, quantization can be completed in a matter of minutes right on a smartphone or laptop without industry-grade hardware or powerful GPUs. 

HIGGS lowers the barrier to entry for testing and deploying new models on consumer-grade devices, like home PCs and smartphones by removing the need for industrial computing power.

The innovative compression method furthers the company’s commitment to making large language models accessible to everyone, from major players, SMBs, and non-profit organizations to individual contributors, developers, and researchers. Last year, Yandex researchers collaborated with major science and technology universities to introduce two novel LLM compression methods: Additive Quantization of Large Language Models (AQLM) and PV-Tuning. Combined, these methods can reduce model size by up to 8 times while maintaining 95% response quality.

Breaking Down LLM Adoption Barriers

Large language models require substantial computational resources, which makes them inaccessible and cost-prohibitive for most. This is also the case for open-source models, like the popular DeepSeek R1, which can’t be easily deployed on even the most advanced servers designed for model training and other machine learning tasks.  

As a result, access to these powerful models has traditionally been limited to a select few organizations with the necessary infrastructure and computing power, despite their public availability. 

However, HIGGS can pave the way for broader accessibility. Developers can now reduce model size without sacrificing quality and run them on more affordable devices. For example, this method can be used to compress LLMs like DeepSeek R1 with 671B parameters and Llama 4 Maverick with 400B parameters, which previously could only be quantized (compressed) with a significant loss in quality. This quantization technique unlocks new ways to use LLMs across various fields, especially in resource-constrained environments. Now, startups and independent developers can leverage compressed models to build innovative products and services, while cutting costs on expensive equipment. 

Yandex is already using HIGGS to prototype and accelerate product development, and idea testing, as compressed models enable faster testing than their full-scale counterparts.

About the Method 

HIGGS (Hadamard Incoherence with Gaussian MSE-optimal GridS) compresses large language models without requiring additional data or gradient descent methods, making quantization more accessible and efficient for a wide range of applications and devices. This is particularly valuable when there’s a lack of suitable data for calibrating the model. The method offers a balance between model quality, size, and quantization complexity, making it possible to use the models on a wide range of devices like smartphones and consumer laptops.

HIGGS was tested on the LLaMA 3.1 and 3.2-family models, as well as on Qwen-family models. Experiments show that HIGGS outperforms other data-free quantization methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantization), in terms of quality-to-size ratio.

Developers and researchers can already access the method on Hugging Face or explore the research paper, which is available on arXiv. At the end of this month, the team will present their paper at NAACL, one of the world’s top conferences on AI. 

Continuous Commitment to Advancing Science and Optimization

This is one of several papers Yandex Research presented on large language model quantization. For example, the team presented AQLM and PV-Tuning, two methods of LLM compression that can reduce a company’s computational budget by up to 8 times without significant loss in AI response quality. The team also built a service that lets users run an 8B model on a regular PC or smartphone via a browser-based interface, even without high computing power.

Beyond LLM quantization, Yandex has open-sourced several tools that optimize resources used in LLM training. For example, the YaFSDP library accelerates LLM training by as much as 25% and reduces GPU resources for training by up to 20%. 

Earlier this year, Yandex developers open-sourced Perforator, a tool for continuous real-time monitoring and analysis of servers and apps. Perforator highlights code inefficiencies and provides actionable insights, which helps companies reduce infrastructure costs by up to 20%. This could translate to potential savings in millions or even billions of dollars per year, depending on company size. 


Check out Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit. Note: Thanks to the Yandex team for the thought leadership/ Resources for this article. Yandex team has financially supported us for this content/article.

The post LLMs No Longer Require Powerful Servers: Researchers from MIT, KAUST, ISTA, and Yandex Introduce a New AI Approach to Rapidly Compress Large Language Models without a Significant Loss of Quality appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

HIGGS 大语言模型 模型压缩 Yandex
相关文章