MarkTechPost@AI 2024年10月03日
Researchers from UC Berkeley Present UnSAM in Computer Vision: A New Paradigm for Segmentation with Minimal Data, Achieving State-of-the-Art Results Without Human Annotation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

UC Berkeley的研究者开发了Unsupervised SAM(UnSAM),这是一种新型无监督方法,用于解决分割任务中的问题。UnSAM采用分治策略,能识别视觉场景中的层次结构并创建不同粒度的分割掩码,在多个数据集上表现出色,证明无需大量人工标注数据集也能获得高质量结果。

🧐UnSAM使用分治策略,识别视觉场景中的层次结构,依据此结构创建具有不同粒度的分割掩码,能捕捉到最细微的细节,确保与SA-1B的性能相当。

🎯CutLER为UnSAM的分治策略奠定基础,其引入的切割学习流程及基于归一化割的方法,可从无标签原始图像中获得语义和实例级掩码,并通过阈值过滤避免噪声。

💪UnSAM在SA-1B数据集上训练时,在AR上超过SAM超过6.7%,在AP上超过3.9%。在PartImageNet和PACO上评估时,分别超过SOTA 16.6%和12.6%,UnSAM+的性能甚至超过SAM 1.3%。

🌟UnSAM证明了无需大量人工努力创建的巨大数据集,也能获得高质量结果,可与小型轻量架构结合,推动医学等敏感领域的发展,开启无监督视觉学习的新时代。

Transformer-based Models in Segmentation tasks have initiated a new transformation in the Computer Vision realm. Meta’s Segment Anything Model has proven to be a benchmark due to its robust and exquisite performance. SAM has proven highly successful as supervised segmentation continues to gain popularity in fields such as medicine, defense, and industry. However, it still needs manual labeling, making training a tough cookie. Human annotation is cumbersome, unreliable for sensitive applications, not to mention expensive and time-consuming. Annotations also establish a tradeoff between accuracy and scalability, setting a limit to exploit the architecture’s potential. SA-1B, despite its enormous size, contains just 11 million images with biased labels. These issues call for a label-less approach that offers performance parallel to SAM but at a much lower cost.

Active strides have been made in this direction with self-supervised transformers and prompt-enabled zero-shot segmentation. TokenCut and LOST were the preliminary efforts followed by CutLER. CutLER generated quality pseudo masks for multiple instances and further learning on these masks. VideoCutLER extended this functionality to videos.

Researchers at UC Berkeley developed Unsupervised SAM (UnSAM), a novel unsupervised method to address the above challenge. Unsupervised SAM uses a divide-and-conquer strategy to identify hierarchical structures in visual scenes and create segmentation masks with varying levels of granularity based on the hierarchical structures. UnSAM’s  Top-Down and Bottom-Up clustering strategies generate masks capturing the most subtle of details, ensuring parallel performance with SA-1B, its human counterpart. By providing ground labels, these masks enable complete and interactive segmentation that captures minutiae better than SAM.

It would be noteworthy to discuss CutLER before diving into the nitty gritty of UnSAM. CutLER introduces a cut-and-learn pipeline with a normalized cut-based method, MaskCut, which generates high-quality masks with the help of a patch-wise cosine similarity matrix obtained from unsupervised ViT. MasKCut iteratively operates, albeit masking out patches from previously segmented instances. This methodology sets the foundation for UnSAM’s divide strategy. It leverages CutLER’s  Normalized Cuts (NCuts)-based method to obtain semantic and instance-level masks from unlabeled raw images. A threshold further filters the generated masks to avoid noise. The Conquer Strategy captures the subtleties, which iteratively merges the generated coarse-grained masks into simpler parts. Non-Maximum Suppression eliminates post-merging redundancy. The bottom-up clustering strategy of UnSAM differentiates it from the earlier tasks and makes it “conquer ” other works while capturing the finest details.

UnSAM outperformed SAM by over 6.7% in AR and 3.9% in AP on the SA-1B dataset when trained with  1% of the dataset. Its performance is comparable when SAM is trained on the complete 11 M image dataset, differing by a mere 1 % considering the dataset size of 0.4 M images. On average, UnSAM surpassed the previous SOTA by 11.0% in AR. When evaluated on PartImageNet and PACO, UnSAM exceeds the SOTA by 16.6% and 12.6 %, respectively. Furthermore, UnSAM+, which combines the accuracies of  SA-1B(1% dataset )  with the intricacies of unsupervised masks, outperforms even SAM by 1.3 %. Even with a backbone three times smaller.

In conclusion, UnSAM proves that high-quality results can be obtained without using humongous datasets created by intensive human endeavors. Small, Lightweight architectures could be used alongside UnSAM-generated masks to advance sensitive fields like medicine and science. UnSAM may not be the big bang in segmentation, but it seems to show the cosmic realm of segmentation light, ushering in a new research era in unsupervised vision learning.


Check out the Paper, Project, and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Want to get in front of 1 Million+ AI Readers? Work with us here

The post Researchers from UC Berkeley Present UnSAM in Computer Vision: A New Paradigm for Segmentation with Minimal Data, Achieving State-of-the-Art Results Without Human Annotation appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

UnSAM 无监督分割 视觉场景 层次结构 性能出色
相关文章