MarkTechPost@AI 2024年09月29日
Multi-View and Multi-Scale Alignment (MaMA): Advancing Mammography with Contrastive Learning and Visual-Language Pre-training
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MaMA是一种将CLIP应用于乳腺摄影的新框架,通过多视图和多尺度对齐解决乳腺摄影的难题,在多个任务和数据集上表现出色,且代码开源。

🎯MaMA利用乳腺摄影的多视图特性和不同尺度的图像特征对齐,解决了数据有限、图像高分辨率和数据集不平衡等问题。

💪MaMA在两个大型乳腺摄影数据集上显著优于现有方法,尽管模型规模仅为最大基线的52%,但在多项任务中表现出色。

📄MaMA引入了从表格数据构建结构化乳腺摄影报告的方法,采用多视图对比图像 - 文本预训练方法,提高了模型性能。

🌟MaMA在域外RSNA-Mammo数据集上进行癌症检测时,表现出较强的泛化能力,取得了更高的平衡准确率和AUC分数。

Multi-View and Multi-Scale Alignment for Mammography Contrastive Learning:
Contrastive Language-Image Pre-training (CLIP) has shown potential in medical imaging, but its application to mammography faces challenges due to limited labeled data, high-resolution images, and imbalanced datasets. This study introduces the first full adaptation of CLIP to mammography through a new framework called Multi-view and Multi-scale Alignment (MaMA). Mammography’s inherent complexities, such as multi-view images with small regions of interest, bilateral asymmetry, and ipsilateral correspondence, demand specialized approaches. MaMA addresses these issues by leveraging the multi-view nature of mammography and aligning image features at different scales. It also uses a symmetric local alignment module to focus on detailed features and a parameter-efficient fine-tuning approach to enhance pre-trained LLMs with medical knowledge. This allows the framework to overcome data scarcity and perform better on mammography tasks.

The MaMA model significantly outperforms existing state-of-the-art methods across multiple tasks on two large mammography datasets, EMBED and RSNA-Mammo, despite using only 52% of the model size compared to the largest baseline. By combining multi-view image alignment and text-image relationships, MaMA effectively learns detailed image representations while maintaining efficient resource usage. This method demonstrates its potential to enhance mammography interpretation through visual-language pre-training, improving cancer detection and diagnosis with fewer computational demands. The code is available for public use to promote further research in this area.

Medical Visual-Language Pre-training Methods:
Existing medical Visual-Language Pre-training (VLP) models are classified into two types. The first involves general-purpose models trained on large-scale datasets with multiple anatomical sites, which show strong generalization but are often outperformed by modality-specific models. The second type focuses on chest X-rays due to the availability of extensive datasets, though they face limitations like pixel imbalance and report alignment. Multi-view contrastive learning, which aligns images from different perspectives, has been applied in mammography but needs more integration with CLIP to exploit multimodal supervision signals fully.

Method:
The proposed MaMA framework introduces a method for constructing structured mammography reports from tabular data and incorporates a multi-view contrastive image-text pre-training approach. It utilizes a template-based caption generation to enhance image understanding and prevent oversimplification. A multi-view contrastive learning framework improves the model’s capability by comparing mammogram views, while the Symmetric Local Alignment (SLA) module enables fine-grained correspondence between image patches and text. Additionally, parameter-efficient fine-tuning (PEFT) of a large pre-trained LLM is employed to improve text encoding, enhancing overall performance without increasing computational costs.

Model Performance on Mammography Datasets:
The experiments utilized the Emory EMBED dataset, comprising over 72,000 multi-view mammograms from 23,356 patients, divided into training, validation, and test sets (70%/10%/20%). The model architecture featured DiNOv2-ViT-B-14 as the image encoder and BioMedLM as the text encoder, with fine-tuning via LoRA for efficiency. The training was optimized using the AdamW optimizer with a 4E-5 learning rate, cosine annealing scheduler, and SLA loss. Hyperparameter tuning included a batch size 144 across four GPUs, and the primary evaluation focused on BI-RADS assessment and breast density prediction, with metrics like balanced accuracy (bACC) and AUC.

MaMA, the proposed model, outperformed baselines such as CLIP, ConVIRT, and MM-MIL in zero-shot and full fine-tuning settings. It demonstrated a 4% improvement in balanced accuracy for BI-RADS and excelled in breast density prediction. MaMA’s robustness was further validated on the out-of-domain RSNA-Mammo dataset for cancer detection, where it achieved higher balanced accuracy and AUC scores compared to the baselines while maintaining adequate sensitivity and specificity. This highlights MaMA’s strong generalization capabilities even with limited training data.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post Multi-View and Multi-Scale Alignment (MaMA): Advancing Mammography with Contrastive Learning and Visual-Language Pre-training appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MaMA 乳腺摄影 对比学习 预训练
相关文章