Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation

cs.AI updates on arXiv.org 前天 12:08

Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation

本文提出了一种名为EMAT的新算法，旨在解决少样本分类与分割任务中小物体识别难题。通过创新注意力机制、降采样策略和参数优化，EMAT在PASCAL和COCO数据集上显著提升性能，同时减少参数量，并引入新评估设置以更贴近实际应用。

arXiv:2507.23642v1 Announce Type: cross Abstract: Few-shot classification and segmentation (FS-CS) focuses on jointly performing multi-label classification and multi-class segmentation using few annotated examples. Although the current state of the art (SOTA) achieves high accuracy in both tasks, it struggles with small objects. To overcome this, we propose the Efficient Masked Attention Transformer (EMAT), which improves classification and segmentation accuracy, especially for small objects. EMAT introduces three modifications: a novel memory-efficient masked attention mechanism, a learnable downscaling strategy, and parameter-efficiency enhancements. EMAT outperforms all FS-CS methods on the PASCAL-5$^i$ and COCO-20$^i$ datasets, using at least four times fewer trainable parameters. Moreover, as the current FS-CS evaluation setting discards available annotations, despite their costly collection, we introduce two novel evaluation settings that consider these annotations to better reflect practical scenarios.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

EMAT算法少样本学习小物体识别多标签分类多类分割

相关文章

Sharpening LLMs: The Sharpest Tools and Essential Techniques for Precision and Clarity

Brief. Bioinform.｜属性引导的原型网络用于少样本分子性质预测

Strategic Chain-of-Thought (SCoT): An Unique AI Method Designed to Refine Large Language Model (LLM) Performance and Reasoning Through Strategy Elicitation

Meta-Learning: Learning to Learn Fast

两位本科生一作，首次提出「持续学习」+「少样本」知识图谱补全

两位本科生一作，首次提出「持续学习」+「少样本」知识图谱补全 | CIKM 2024

Can You Turn Your Vision-Language Model from a Zero-Shot Model to Any-Shot Generalist? Meet LIxP, the Context-Aware Multimodal Framework

KDD2025 | 多标签节点分类场景下，阿里安全&浙大对图神经网络增强发起挑战

数据不够致Scaling Law撞墙？CMU和DeepMind新方法可让VLM自己生成记忆

研究人员提出相关性-多样性增强选择框架，提高模型推理能力和分类准确性