MarkTechPost@AI 2024年10月20日
Meta AI Releases Cotracker3: A Semi-Supervised Tracker that Produces Better Results with Unlabelled Data and Simple Architecture
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Cotracker3是Meta推出的新跟踪模型,采用半监督方式和更简单机制,可使用未标注的真实视频进行训练。它消除了前序跟踪器的部分组件,具有更好的效果、更小的架构和训练原料,且在多个方面表现出色。

🎯Cotracker3是一种新的跟踪模型,利用伪标签实现无标注真实视频的训练,它消除了前序跟踪器的部分组件,以更小架构取得更好结果。

📽️Cotracker3的工作方法直接,预测视频中每一帧的对应点跟踪,提供可见性和置信度分数,且有在线和离线两个版本。

💪Cotracker3比该领域其他跟踪器更精简、快速,参数少,速度快,在各种基准测试中具有很强竞争力,某些情况下甚至超越了最先进的模型。

✨Cotracker3从基础模型中汲取灵感,采用简单的半监督训练协议,将多种优点融合,其表现优于其他跟踪器。

Point tracking is paramount in video; from 3d reconstruction to editing tasks, a precise approximation of points is necessary to achieve quality results. Over time, trackers have incorporated transformer and neural network-based designs to track individual and multiple points simultaneously. However, these neural networks could be fully exploited only with high-quality training data. Now, while there is an abundance of videos that constitute a good training set, tracking points need to be annotated manually. Synthetic videos seem an excellent substitute to solve the above problem, but they are computationally extravagant and less lucrative than real videos. In the light of this situation, unsupervised learning shows great potential. This article delves into a new effort to take over the state of the art in tracking with a semi-supervised approach and a much simpler mechanism.

Meta put forth Cotracker 3, a new tracking model that allows real videos without annotation for the training process using pseudo labels generated by off-the-shelf teachers. Cotracker3 eliminates components from previous trackers to achieve better results with much smaller architectures and training feedstock. Furthermore, it addresses the question of scalability. Although researchers have done great work in unsupervised tracking with real videos, its complexity and requirements are questionable. The current state of the art in unsupervised tracking needs enormous training videos alongside complex architecture. The preliminary question is, ‘ Are Millions of Training videos necessary for a tracker to be entitled good?’ Additionally, different researchers have made improvements to previous works. Still, it remains to be seen if all of these designs are required for good tracking or if there is a scope for elimination/simplified substitution of some.

Cotracker3 is an amalgamation of previous works that takes features and improvises on them. For instance, it takes iterative updates, convolutional features from PIPs, and unrolled training from one of its earlier releases, Cotracker. The working methodology of Cotracker 3 is straightforward. It predicts the corresponding point track for each frame in a video as per the given query. It gives it alongside the visibility and confidence score. Visibility shows if the tracked point is visible or occluded. In contrast, confidence measures whether the network is confident that the tracked point is within a certain distance from the ground truth in the current frame. Cotracker 3 comes in two versions – online and offline. The online version operates in a sliding window, only processing the input video sequentially and tracking points forward. In contrast, the offline version processes the entire video as a single sliding window.

For training, the dataset consisted of around 100,000 videos. Next, multiple teacher models were trained on synthetic data. Then, a teacher is randomly chosen for training, and query points are selected from some video frames using the SIRF detection sampling method. Further delving into the technical details for each frame, convolutional networks are employed to extract feature maps and calculate the correlation between these feature vectors. This 4D correlation calculation is done with an MLP. A transformer iteratively updates values of Visibility and Confidence earlier initialized at 0.

CoTracker3 is considerably leaner and faster than other trackers in this field. Compared to its predecessor alone, it has half as many parameters in Cotracker. It also beats the current fastest Tracker by 27% due to its global matching strategy and MLP utilization.CoTracker3 is highly competitive with other trackers across various benchmarks. In some cases, it even superseded state-of-the-art models. When comparing Cotracker3’s online and offline model, it was observed that the online version efficiently tracked occluded points. In contrast, online tracking was feasible in real-time without space constraints.

Cotracker 3 took inspiration from base models and combined their goodness into a smaller package. It used a simple semi-supervised training protocol where videos were annotated with various off-shelf trackers to finetune a model that outperformed all the other trackers, showing that beauty does lie in simplicity.


Check out the Paper, Code, Demo, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] Learn how to increase inference throughput by 4x and reduce serving costs by 50% with Turbo LoRA, FP8 and GPU Autoscaling (Promoted)

The post Meta AI Releases Cotracker3: A Semi-Supervised Tracker that Produces Better Results with Unlabelled Data and Simple Architecture appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cotracker3 半监督学习 跟踪模型 高效性能
相关文章