MarkTechPost@AI 2024年09月13日
MedUnA: Efficient Medical Image Classification through Unsupervised Adaptation of Vision-Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MedUnA 是一种用于医疗图像分类的无监督适应方法,它利用视觉语言模型 (VLMs) 的能力,通过文本描述和图像之间的对齐来实现高效的分类。MedUnA 采用两阶段训练,首先使用预训练的语言模型 (LLM) 生成文本描述,然后通过自适应器训练来将文本和视觉信息进行对齐,最终实现无监督学习。MedUnA 能够有效地适应医疗图像分类任务,减少对标记数据的依赖,提高可扩展性。

📑 **MedUnA 采用两阶段训练:** 首先,使用预训练的语言模型 (LLM) 生成文本描述,并使用文本编码器将这些描述嵌入到向量空间中。然后,训练一个跨模态自适应器,通过最小化生成的 logits 和真实标签之间的交叉熵来进行训练。在第二阶段,自适应器使用无监督方式进一步训练,将弱增强和强增强后的输入图像分别通过两个分支进行处理。强分支使用可学习的提示向量,训练的目标是最小化两个分支输出之间的差异。

📢 **MedUnA 显著提高了分类精度:** 在五个公开的医疗数据集上进行的实验表明,MedUnA 在大多数情况下都优于零样本 MedCLIP,特别是在肺结节、糖尿病性视网膜病变和皮肤癌等疾病的分类中表现出色。

📣 **MedUnA 减少了对标记数据的依赖:** 与需要大量预训练数据的传统方法不同,MedUnA 利用现有的视觉和文本嵌入之间的对齐,避免了大规模预训练。它使用未标记的图像和 LLM 生成的自动描述,通过自适应器和提示向量来进行训练,从而实现高效的分类性能。

📤 **MedUnA 具有可扩展性:** 该方法能够有效地适应各种医疗图像分类任务,并且可以轻松地扩展到新的数据集和疾病。

📥 **MedUnA 提高了医疗图像分类的可解释性:** 通过分析 t-SNE 图,可以观察到 MedUnA 生成的聚类更加清晰,这表明该方法能够更好地区分不同的疾病类别,从而提高分类的准确性和可解释性。

Supervised learning in medical image classification faces challenges due to the scarcity of labeled data, as expert annotations are difficult to obtain. Vision-Language Models (VLMs) address this issue by leveraging visual-text alignment, allowing unsupervised learning, and reducing reliance on labeled data. Pre-training on large medical image-text datasets enables VLMs to generate accurate labels and captions, lowering annotation costs. Active learning prioritizes key samples for expert annotation, while transfer learning fine-tunes pre-trained models on specific medical datasets. VLMs also generate synthetic images and annotations, enhancing data diversity and model performance in medical imaging tasks.

Researchers from Mohamed Bin Zayed University of AI and Inception Institute of AI propose MedUnA, a Medical Unsupervised Adaptation method for image classification. MedUnA employs two-stage training: Adapter Pre-training using text descriptions generated by an LLM aligned with class labels, followed by Unsupervised Learning. The adapter integrates with MedCLIP’s visual encoder, utilizing entropy minimization to align visual and text embeddings. MedUnA addresses the modality gap between textual and visual data, improving classification performance without extensive pre-training. This method efficiently adapts vision-language models for medical tasks, reducing reliance on labeled data and enhancing scalability.

A common method for using VLMs in medical imaging involves extensive pre-training on large datasets, followed by fine-tuning for tasks like classification, segmentation, and report generation. Unlike these resource-intensive strategies, MedUnA leverages the existing alignment between visual and textual embeddings to avoid large-scale pre-training. It uses unlabeled images and auto-generated descriptions from an LLM for disease categories. A lightweight adapter and prompt vector are trained to minimize self-entropy, ensuring confident performance across multiple data augmentations. MedUnA offers improved efficiency and performance without the need for extensive pre-training.

The methodology consists of two stages: adapter pre-training and unsupervised training. In Stage 1, textual descriptions for each class are generated using an LLM and embedded via a text encoder. A cross-modal adapter is trained by minimizing cross-entropy between the generated logits and ground-truth labels. In Stage 2, the adapter is further trained using medical images in an unsupervised manner, with weak and strong augmentations of the input passed through two branches. The strong branch uses a learnable prompt, and training minimizes the difference between the outputs of the two branches. Inference is performed using the optimized strong branch.

The experiments tested the proposed method using five public medical datasets, covering diseases like tuberculosis, pneumonia, diabetic retinopathy, and skin cancer. Text descriptions for the classes in each dataset were generated using GPT-3.5 and other language models and then fed into a text classifier. The method was evaluated using CLIP and MedCLIP visual encoders, with MedCLIP performing better overall. Unsupervised learning was used to generate pseudo-labels for unlabelled images, and models were trained using the SGD optimizer. Results showed that MedUnA, the proposed method, achieved superior accuracy compared to baseline models.

The study analyzes the experimental results, highlighting the performance of MedUnA compared to other methods like CLIP, MedCLIP, LaFTer, and TPT. MedUnA demonstrates notable accuracy improvements on several medical datasets, particularly outperforming zero-shot MedCLIP in most cases. Minimal improvement is observed on the Pneumonia dataset due to MedCLIP’s pre-training. Additionally, t-SNE plots indicate that MedUnA produces more distinct clustering, enhancing classification precision. The correlation between text classifier accuracy from various LLMs and MedUnA’s performance is also explored, along with an ablation study on the impact of different loss functions.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

The post MedUnA: Efficient Medical Image Classification through Unsupervised Adaptation of Vision-Language Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

医疗图像分类 视觉语言模型 无监督学习 MedUnA AI医疗
相关文章