The Effect of Compression Techniques on Large Multimodal Language Models in the Medical Domain

cs.AI updates on arXiv.org 07月30日 12:12

本文评估了结构剪枝和激活感知量化对LLAVA模型在医学领域的应用影响，提出了一种新的层选择剪枝方法，分析了不同的量化技术，并在剪枝-SFT-量化流程中评估了性能权衡。该方法使7B参数的MLLM在4GB VRAM内运行，相比传统技术内存使用降低70%，性能提升4%。

arXiv:2507.21976v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) hold huge potential for usage in the medical domain, but their computational costs necessitate efficient compression techniques. This paper evaluates the impact of structural pruning and activation-aware quantization on a fine-tuned LLAVA model for medical applications. We propose a novel layer selection method for pruning, analyze different quantization techniques, and assess the performance trade-offs in a prune-SFT-quantize pipeline. Our proposed method enables MLLMs with 7B parameters to run within 4 GB of VRAM, reducing memory usage by 70% while achieving 4% higher model performance compared to traditional pruning and quantization techniques in the same compression ratio.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签