MarkTechPost@AI 2024年07月31日
Meta AI Introduces Meta Segment Anything Model 2 (SAM 2): The First Unified Model for Segmenting Objects Across Images and Videos
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta 推出 SAM 2,这是一款用于图像和视频中实时可提示对象分割的统一模型,具有多种强大功能和广泛应用前景。

🎯SAM 2 是 Meta 下一代 Segment Anything Model,在前代基础上扩展能力,可对图像和视频中的对象进行实时分割与跟踪,无需定制适配,能零样本泛化分割任何对象。

💪SAM 2 具有高效性,交互时间比以往模型少三倍,同时实现了更优的图像和视频分割精度,对实际应用至关重要。

🌐SAM 2 应用广泛,在创意产业可生成新视频效果,在数据标注中可加速视觉数据标注,在科学和医学领域可辅助研究和诊断,还能在无人机拍摄中协助监测等。

🔓Meta 秉持开放科学理念,SAM 2 项目开源模型代码和权重,并发布大型数据集,促进 AI 社区的合作与创新。

Meta has introduced SAM 2, the next generation of its Segment Anything Model. Building on the success of its predecessor, SAM 2 is a groundbreaking unified model designed for real-time promptable object segmentation in images and videos. SAM 2 extends the original SAM’s capabilities, primarily focused on images. The new model seamlessly integrates with video data, offering real-time segmentation and tracking of objects across frames. This capability is achieved without custom adaptation, thanks to SAM 2’s ability to generalize to new and unseen visual domains. The model’s zero-shot generalization means it can segment any object in any video or image, making it highly versatile and adaptable to various use cases.

One of the most notable features of SAM 2 is its efficiency. It requires less interaction time, three times less than previous models, while achieving superior image and video segmentation accuracy. This efficiency is crucial for practical applications where time and precision are of the essence.

The potential applications of SAM 2 are vast and varied. For instance, in the creative industry, the model can generate new video effects, enhancing the capabilities of generative video models and unlocking new avenues for content creation. In data annotation, SAM 2 can expedite the labeling of visual data, thereby improving the training of future computer vision systems. This is particularly beneficial for industries relying on large datasets for training, such as autonomous vehicles and robotics.

SAM 2 holds promise in the scientific and medical fields. It can segment moving cells in microscopic videos, aiding research and diagnostic processes. The model’s ability to track objects in drone footage can assist in monitoring wildlife and conducting environmental studies.

In line with Meta’s commitment to open science, the SAM 2 project includes releasing the model’s code and weights under an Apache 2.0 license. This openness encourages collaboration & innovation within the AI community, allowing researchers and developers to explore new capabilities and applications of the model. Meta has released the SA-V dataset, a comprehensive collection of approximately 51,000 real-world videos and over 600,000 spatio-temporal masks, under a CC BY 4.0 license. This dataset is significantly larger than previous datasets, providing a rich resource for training and testing segmentation models.

The development of SAM 2 involved significant technical innovations. The model’s architecture builds on the foundation laid by SAM, extending its capabilities to handle video data. This involves a memory mechanism that enables the model to recall previously processed information and accurately segment objects across video frames. The memory encoder, memory bank, and memory attention module are critical components that allow SAM 2 to manage the complexities of video segmentation, such as object motion, deformation, and occlusion.

The SAM 2 team developed a promptable visual segmentation task to address the challenges posed by video data. This task allows the model to take input prompts in any video frame and predict a segmentation mask, which is then propagated across all frames to create a spatiotemporal mask. This iterative process ensures precise and refined segmentation results.

In conclusion, SAM 2 offers unparalleled real-time object segmentation capabilities in images and videos. Its versatility, efficiency, and open-source nature make it a valuable tool for many applications, from creative industries to scientific research. By sharing SAM 2 with the global AI community, Meta fosters innovation and collaboration, paving the way for future breakthroughs in computer vision technology.

"Up until today, annotating masklets in videos has been clunky; combining the first SAM model with other video object segmentation models. With SAM 2 annotating masklets will reach a whole new level. I consider the reported 8x speedup to be the lower bound of what is achievable with the right UX, and with +1M inferences with SAM on the Encord platform, we’ve seen the tremendous value that these types of models can provide to ML teams. " - Dr Frederik Hvilshøj - Head of ML at Encord

Check out the Paper, Download the Model, Dataset, and Try the demo here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post Meta AI Introduces Meta Segment Anything Model 2 (SAM 2): The First Unified Model for Segmenting Objects Across Images and Videos appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SAM 2 图像视频分割 高效应用 开放科学 广泛用途
相关文章