MarkTechPost@AI 2024年11月06日
DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DELTA是一种旨在高效跟踪长视频序列中3D空间每一个像素的方法。现有方法在实现详细3D跟踪时面临诸多挑战,而DELTA通过时空注意力进行低分辨率跟踪,并应用基于注意力的上采样器实现高分辨率精度,在多个数据集上取得了优异成果。

🎯DELTA是首个能在长视频序列中高效跟踪3D空间每一个像素的方法

💡通过时空注意力进行低分辨率跟踪,用基于注意力的上采样器提高精度

🎉在多个数据集上取得领先成果,如CVO和Kubric3D,速度快且精度高

⚠️存在一些局限性,如对长时间遮挡点的处理及对视频帧数的要求

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. Existing methods face challenges in achieving detailed 3D tracking because they often track only a few points, which need more detail for full-scene understanding. They also demand computational power, making it difficult to handle long videos efficiently. Additionally, many of them must be fixed to maintain accuracy over extended sequences, as problems like camera movement and object occlusion cause the model to lose track or introduce errors.

Current methods include several approaches for estimating motion in video sequences, each with unique strengths and limitations. Optical flow techniques provide dense pixel-wise tracking but struggle with robustness in complex scenes, especially when extended to long sequences. Scene Flow generalizes optical flow to estimate dense 3D motion, using either RGB-D data or point clouds, but it remains challenging to apply efficiently over long sequences. Point tracking captures motion trajectories by tracking specific points, with recent advancements incorporating spatial and temporal attention for smoother tracking. However, point-tracking methods still need to improve in achieving dense monitoring due to the high computational cost. Tracking by Reconstructing methods uses a deformation field to estimate motion making them less practical for real-time applications.

A team of researchers from UMass Amherst & MIT-IBM Watson AI Lab, Snap Inc. have proposed DELTA (Dense Efficient Long-range 3D Tracking for Any video), the first method designed to efficiently track every pixel in 3D space across long video sequences. DELTA operates by starting with reduced-resolution tracking via spatio-temporal attention and applying an attention-based upsampler for high-resolution accuracy. Key innovations include an upsampler for sharp motion boundaries, an efficient spatial attention architecture for dense tracking, and a log-depth representation that enhances tracking performance. DELTA achieves state-of-the-art results on the CVO and Kubric3D datasets, showing over 10% improvement in metrics like Average Jaccard (AJ) and Average Position Difference in 3D (APD3D), and performs competitively on 3D point tracking benchmarks such as TAP-Vid3D and LSFOdyssey. Unlike existing methods, DELTA delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy.

An experiment conducted showed that DELTA excels in 3D tracking tasks, outperforming previous methods in speed and accuracy. Trained on Kubric’s dataset with over 5,600 videos, DELTA’s loss function combines 2D coordinate, depth, and visibility losses. 

In benchmarks, DELTA achieved top scores on CVO for long-range 2D tracking and on Kubric3D for dense 3D tracking, completing tasks much faster than other methods. DELTA’s design choices, including log-depth representation, spatial attention, and an attention-based upsampler, significantly enhance its accuracy and efficiency across diverse tracking scenarios.

In conclusion, DELTA is a highly efficient method for tracking every pixel across video frames, achieving accuracy in dense 2D and 3D tracking with a faster runtime than existing methods. The model may need help with points that remain occluded for extended periods and perform best on videos with fewer than several hundred frames. The approach has limitations similar to those of earlier methods as it utilizes shorter temporal processing windows. Moreover, the method’s 3D tracking accuracy relies on the precision and temporal stability of the monocular depth estimation used. Anticipated monocular depth estimation research improvements will likely enhance the method’s performance further.


Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

The post DELTA: A Novel AI Method that Efficiently (10x Faster) Tracks Every Pixel in 3D Space from Monocular Videos appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DELTA 3D跟踪 时空注意力 上采样器
相关文章