Accumulator-Aware Post-Training Quantization for Large Language Models

cs.AI updates on arXiv.org 前天 12:08

Accumulator-Aware Post-Training Quantization for Large Language Models

本文介绍AXE量化框架，旨在为PTQ算法提供溢出避免保障，并通过理论论证和实验验证其有效性和灵活性。

arXiv:2409.17092v2 Announce Type: replace-cross Abstract: When quantizing weights and activations to increasingly narrower representations, the cost of additions begins to dominate that of multiplications in multiply-accumulate (MAC) units. Recent studies show that reducing addition costs via low-precision accumulation improves throughput, power, and area across inference platforms, albeit with an increased risk of overflow. Accumulator-aware quantization research has so far only considered the quantization-aware training (QAT) paradigm, in which models are fine-tuned or trained from scratch with quantization in the loop. As models and datasets continue to grow in size, QAT techniques become increasingly more expensive, which has motivated the recent surge in post-training quantization (PTQ) research. To bridge this gap, we introduce AXE, the first accumulator-aware quantization framework explicitly designed to endow overflow avoidance guarantees to PTQ algorithms. We present theoretical motivation for AXE and demonstrate its flexibility by implementing it on top of two existing algorithms: GPFQ and OPTQ. We design AXE to support multi-stage accumulation, opening the door to full datapath optimization for the first time. We evaluate AXE using recent language generation models; when quantizing Llama3 8B for a 16-bit multi-stage accumulation datapath, AXE maintains up to 98% of the FP16 perplexity, surpassing naive bit width manipulation by up to 15%.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

量化溢出避免 PTQ算法 AXE框架数据路径优化

相关文章

The Next Big Trends in Large Language Model (LLM) Research

使用PEFT库进行ChatGLM3-6B模型的QLORA高效微调

基金经理业绩不好，确实可以批评，但以此来否定他的研究，甚至人身攻击，是有失偏颇的。在A股做主观投资，是一门艺术，而不是科学，有学识不一定能赚钱，反而在A...

交易难，难于上青天。早盘集合竞价，大众交通这种当红炸子鸡点的股，资金开始集合竞价加丹引诱量化，量化真就开盘突突了一大片无人驾驶的个股。看起来一片热热闹...

roots-4 - Track your digital dopamine, break your phone addiction

乡亲们，过分了哈！跌的时候天天骂转融通和量化，不停转融通和量化，坚决不入场。现在转融通暂停了，融券保证金提高，量化也在增本降速，这一系列措施中翻中就是...

Q-GaLore Released: A Memory-Efficient Training Approach for Pre-Training and Fine-Tuning Machine Learning Models

$上证指数(SH000001)$ $沪深300ETF(SH510300)$ 当大家以为会议开完，郭嘉不再护盘的时候，郭嘉队反而比前几天更大力度地护盘。从沪深300ETF的分时线来看，今天起...

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

Llama对决GPT：AI开源拐点已至?