MarkTechPost@AI 03月06日
MVGD from Toyota Research Institute: Zero Shot 3D Scene Reconstruction
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

丰田研究院推出了Multi-View Geometric Diffusion (MVGD),这是一种基于扩散的创新架构,可以直接从稀疏的、带有姿态的图像中合成高保真度的新RGB和深度图。它无需像NeRF或3D Gaussian splats那样构建显式的3D表示。MVGD将隐式3D推理直接集成到单个扩散模型中,生成与输入图像保持尺度对齐和几何一致性的图像和深度图,无需构建中间3D模型。MVGD在超过6000万个多视图图像样本上进行训练,具备强大的泛化能力,无需显式微调即可在未见过的领域中表现出色。MVGD通过消除显式的3D表示,简化了3D流程,增强了真实感,并提高了可扩展性和适应性。

💡MVGD通过像素级扩散保留图像细节,不同于潜在扩散模型,MVGD采用基于token的架构在原始图像分辨率下运行,从而保留了精细的细节。

🎨MVGD采用联合任务嵌入,通过多任务设计,MVGD能够联合生成RGB图像和深度图,利用统一的几何和视觉先验。

📏MVGD具备场景尺度归一化能力,MVGD自动根据输入相机姿态归一化场景尺度,确保跨不同数据集的几何连贯性。

🚀MVGD通过增量调节和可扩展的微调,增强了多功能性,增量调节允许通过将生成的新视图反馈到模型中来改进它们。可扩展的微调使模型能够逐步扩展,从而在不进行大量重新训练的情况下提高性能。

Toyota Research Institute Researchers have unveiled Multi-View Geometric Diffusion (MVGD), a groundbreaking diffusion-based architecture that directly synthesizes high-fidelity novel RGB and depth maps from sparse, posed images, bypassing the need for explicit 3D representations like NeRF or 3D Gaussian splats. This innovation promises to redefine the frontier of 3D synthesis by offering a streamlined, robust, and scalable solution for generating realistic 3D content.

The core challenge MVGD addresses is achieving multi-view consistency: ensuring generated novel viewpoints seamlessly integrate in 3D space. Traditional methods rely on building complex 3D models, which often suffer from memory constraints, slow training, and limited generalization. MVGD, however, integrates implicit 3D reasoning directly into a single diffusion model, generating images and depth maps that maintain scale alignment and geometric coherence with input images without intermediate 3D model construction.

MVGD leverages the power of diffusion models, known for their high-fidelity image generation, to encode appearance and depth information simultaneously

 Key innovative components include:

Training on an unprecedented scale, with over 60 million multi-view image samples from real-world and synthetic datasets, empowers MVGD with exceptional generalization capabilities. This massive dataset enables:

MVGD achieves state-of-the-art performance on benchmarks like RealEstate10K, CO3Dv2, and ScanNet, surpassing or matching existing methods in both novel view synthesis and multi-view depth estimation.

MVGD introduces incremental conditioning and scalable fine-tuning, enhancing its versatility and efficiency.

The implications of MVGD are significant:

MVGD represents a significant leap forward in 3D synthesis, merging diffusion elegance with robust geometric cues to deliver photorealistic imagery and scale-aware depth. This breakthrough signals the emergence of “geometry-first” diffusion models, poised to revolutionize immersive content creation, autonomous navigation, and spatial AI.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post MVGD from Toyota Research Institute: Zero Shot 3D Scene Reconstruction appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MVGD 3D场景重建 扩散模型 丰田研究院
相关文章