MarkTechPost@AI 2024年09月10日
VQ4DiT: A Fast Post-Training Vector Quantization Method for DiTs (Diffusion Transformers Models)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

VQ4DiT是一种针对DiT(Diffusion Transformers)模型进行后训练向量量化(VQ)的方法,旨在解决DiT模型参数量大、计算复杂度高导致难以部署在边缘设备上的问题。该方法通过对每个层的权重进行分解,并利用零数据和分块校准策略,可以有效地校准码本和候选分配集,从而在保持模型性能的同时减少模型大小和计算量,为DiT模型在边缘设备上的部署提供了可行的方案。

🤔 VQ4DiT 是一种针对 DiT 模型进行后训练向量量化(VQ)的方法,旨在解决 DiT 模型参数量大、计算复杂度高导致难以部署在边缘设备上的问题。 传统的后训练量化方法(PTQ)在低比特量化(如 2 比特)时会显著降低模型精度,而传统的 VQ 方法仅校准码本,没有调整分配,导致权重子向量分配不准确,进而影响模型性能。 VQ4DiT 提出了一种创新的方法,将每个层的权重分解为码本和候选分配集,并利用零数据和分块校准策略,可以有效地校准码本和候选分配集,从而在保持模型性能的同时减少模型大小和计算量。

🚀 VQ4DiT 使用一种零数据和分块校准策略,可以同时校准码本和候选分配集,以最小化浮点模型和量化模型在每个时间步和 DiT 块上的输出均方误差。 该方法可以确保量化模型的性能与浮点模型相近,同时避免由于累积量化误差导致的校准崩溃。 通过这种方式,VQ4DiT 可以有效地将 DiT 模型压缩到 2 比特甚至更低的精度,同时保持较高的图像生成质量。

💻 VQ4DiT 在 ImageNet 256×256 和 512×512 数据集上对 DiTXL/2 模型进行了实验验证,结果表明 VQ4DiT 在不同采样时间步和权重比特宽度上都优于其他量化方法,包括 RepQ-ViT、GPTQ 和 Q-DiT。 VQ4DiT 在 2 比特量化下,仍然可以生成高质量的图像,而其他算法则出现性能崩溃。 这些结果表明 VQ4DiT 可以有效地将 DiT 模型压缩到 2 比特精度,同时保持较高的图像生成质量,为 DiT 模型在边缘设备上的部署提供了可行的方案。

Text-to-image diffusion models have made significant strides in generating complex and faithful images from input conditions. Among these, Diffusion Transformers Models (DiTs) have emerged as particularly powerful, with OpenAI’s SoRA being a notable application. DiTs, constructed by stacking multiple transformer blocks, utilize the scaling properties of transformers to achieve enhanced performance through flexible parameter expansion. While DiTs outperform UNet-based diffusion models in image quality, they face deployment challenges due to their large parameter count and high computational complexity. For instance, generating a 256 × 256 resolution image using the DiT XL/2 model requires over 17 seconds and 105 Gflops on an NVIDIA A100 GPU. This computational demand makes deploying DiTs on edge devices with limited resources impractical, prompting researchers to explore efficient deployment methods, particularly through model quantization.

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Text-to-image diffusion models have made significant strides in generating complex and faithful images from input conditions. Among these, Diffusion Transformers Models (DiTs) have emerged as particularly powerful, with OpenAI’s SoRA being a notable application. DiTs, constructed by stacking multiple transformer blocks, utilize the scaling properties of transformers to achieve enhanced performance through flexible parameter expansion. While DiTs outperform UNet-based diffusion models in image quality, they face deployment challenges due to their large parameter count and high computational complexity. For instance, generating a 256 × 256 resolution image using the DiT XL/2 model requires over 17 seconds and 105 Gflops on an NVIDIA A100 GPU. This computational demand makes deploying DiTs on edge devices with limited resources impractical, prompting researchers to explore efficient deployment methods, particularly through model quantization.

Prior attempts to address the deployment challenges of diffusion models have primarily focused on model quantization techniques. Post-training quantization (PTQ) has been widely used due to its rapid implementation without extensive fine-tuning. Vector quantization (VQ) has shown promise in compressing CNN models to extremely low bit-widths. However, these methods face limitations when applied to DiTs. PTQ methods significantly reduce model accuracy at very low bit-widths, such as 2-bit quantization. Traditional VQ methods only calibrate the codebook without adjusting assignments, leading to suboptimal outcomes due to incorrect assignment of weight sub-vectors and inconsistent gradients to the codebook.

The application of classic uniform quantization (UQ) and VQ to the DiT XL/2 model reveals significant challenges in achieving optimal performance at extremely low bit-widths. While VQ outperforms UQ in terms of quantization error, it still faces issues with performance degradation, especially at 2-bit and 3-bit quantization levels. The trade-off between codebook size, memory usage, and quantization error presents a complex optimization problem. Fine-tuning quantized DiTs on large datasets like ImageNet is computationally intensive and time-consuming. Also, the accumulation of quantization errors in these large-scale models leads to suboptimal results, even after fine-tuning. The key issue lies in the conflicting gradients for sub-vectors with the same assignment, hindering proper codeword updates.

To overcome the limitations of existing quantization methods, researchers from Zhejiang University and vivo Mobile Communication Co., Ltd have developed Efficient Post-Training Vector Quantization for Diffusion Transformers (VQ4DiT). This robust approach efficiently and accurately vector quantizes DiTs without requiring a calibration dataset. VQ4DiT decomposes the weights of each layer into a codebook and candidate assignment sets, initializing each candidate assignment with an equal ratio. It then employs a zero-data and block-wise calibration strategy to simultaneously calibrate codebooks and candidate assignment sets. This method minimizes the mean square error between the outputs of floating-point and quantized models at each timestep and DiT block, ensuring the quantized model maintains performance similar to its floating-point counterpart while avoiding calibration collapse due to cumulative quantization errors.

The DiT XL/2 model, quantized using VQ4DiT, demonstrates superior performance on ImageNet 256×256 and 512×512 datasets across various sample timesteps and weight bit-widths. At 256×256 resolution, VQ4DiT outperforms other methods, including RepQ-ViT, GPTQ, and Q-DiT, especially under 3-bit quantization. VQ4DiT maintains performance close to the floating-point model, with minimal increases in FID and decreases in IS. At 2-bit quantization, where other algorithms collapse, VQ4DiT continues to generate high-quality images with only a slight decrease in precision. Similar results are observed at 512×512 resolution, indicating VQ4DiT’s capability to produce high-quality, high-resolution images with minimal memory usage, making it ideal for deploying DiTs on edge devices.

This study presents VQ4DiT, a unique and robust post-training vector quantization method for DiTs, that addresses key challenges in efficient quantization. By balancing codebook size with quantization error and resolving inconsistent gradient directions, VQ4DiT achieves optimal assignments and codebooks through a zero-data and block-wise calibration process. This innovative approach calculates candidate assignment sets for each sub-vector and progressively calibrates each layer’s codebook and assignments. Experimental results demonstrate VQ4DiT’s effectiveness in quantizing DiT weights to 2-bit precision while preserving high-quality image generation capabilities. This advancement significantly enhances the potential for deploying DiTs on resource-constrained edge devices, opening new possibilities for efficient, high-quality image generation in various applications.

Prior attempts to address the deployment challenges of diffusion models have primarily focused on model quantization techniques. Post-training quantization (PTQ) has been widely used due to its rapid implementation without extensive fine-tuning. Vector quantization (VQ) has shown promise in compressing CNN models to extremely low bit-widths. However, these methods face limitations when applied to DiTs. PTQ methods significantly reduce model accuracy at very low bit-widths, such as 2-bit quantization. Traditional VQ methods only calibrate the codebook without adjusting assignments, leading to suboptimal outcomes due to incorrect assignment of weight sub-vectors and inconsistent gradients to the codebook.

The application of classic uniform quantization (UQ) and VQ to the DiT XL/2 model reveals significant challenges in achieving optimal performance at extremely low bit widths. While VQ outperforms UQ in terms of quantization error, it still faces issues with performance degradation, especially at 2-bit and 3-bit quantization levels. The trade-off between codebook size, memory usage, and quantization error presents a complex optimization problem. Fine-tuning quantized DiTs on large datasets like ImageNet is computationally intensive and time-consuming. Also, the accumulation of quantization errors in these large-scale models leads to suboptimal results, even after fine-tuning. The key issue lies in the conflicting gradients for sub-vectors with the same assignment, hindering proper codeword updates.

To overcome the limitations of existing quantization methods, researchers from Zhejiang University and vivo Mobile Communication Co., Ltd have developed Efficient Post-Training Vector Quantization for Diffusion Transformers (VQ4DiT). This robust approach efficiently and accurately vector quantizes DiTs without requiring a calibration dataset. VQ4DiT decomposes the weights of each layer into a codebook and candidate assignment sets, initializing each candidate assignment with an equal ratio. It then employs a zero-data and block-wise calibration strategy to simultaneously calibrate codebooks and candidate assignment sets. This method minimizes the mean square error between the outputs of floating-point and quantized models at each timestep and DiT block, ensuring the quantized model maintains performance similar to its floating-point counterpart while avoiding calibration collapse due to cumulative quantization errors.

The DiT XL/2 model, quantized using VQ4DiT, demonstrates superior performance on ImageNet 256×256 and 512×512 datasets across various sample timesteps and weight bit-widths. At 256×256 resolution, VQ4DiT outperforms other methods, including RepQ-ViT, GPTQ, and Q-DiT, especially under 3-bit quantization. VQ4DiT maintains performance close to the floating-point model, with minimal increases in FID and decreases in IS. At 2-bit quantization, where other algorithms collapse, VQ4DiT continues to generate high-quality images with only a slight decrease in precision. Similar results are observed at 512×512 resolution, indicating VQ4DiT’s capability to produce high-quality, high-resolution images with minimal memory usage, making it ideal for deploying DiTs on edge devices.

This study presents VQ4DiT, a unique and robust post-training vector quantization method for DiTs, that addresses key challenges in efficient quantization. By balancing codebook size with quantization error and resolving inconsistent gradient directions, VQ4DiT achieves optimal assignments and codebooks through a zero-data and block-wise calibration process. This innovative approach calculates candidate assignment sets for each sub-vector and progressively calibrates each layer’s codebook and assignments. Experimental results demonstrate VQ4DiT’s effectiveness in quantizing DiT weights to 2-bit precision while preserving high-quality image generation capabilities. This advancement significantly enhances the potential for deploying DiTs on resource-constrained edge devices, opening new possibilities for efficient, high-quality image generation in various applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and LinkedIn. Join our Telegram Channel.

If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post VQ4DiT: A Fast Post-Training Vector Quantization Method for DiTs (Diffusion Transformers Models) appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

VQ4DiT DiT 向量量化 模型压缩 边缘计算
相关文章