MarkTechPost@AI 07月11日 05:30
NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA、多伦多大学等机构的研究人员联合推出了DiffusionRenderer,这是一个革命性的AI框架,它能够从单个视频中理解和操作3D场景。该技术解决了长期存在的视频编辑难题,让用户可以轻松改变光照、材质,以及无缝插入新元素,从而为电影制作人、设计师和内容创作者提供了强大的编辑工具。DiffusionRenderer通过创新的数据策略和先进的神经网络技术,实现了对真实世界视频的精确编辑,推动了AI在内容创作领域的应用。

💡 DiffusionRenderer的核心在于其独特的框架设计,它将场景的属性(“what”)和渲染方式(“how”)统一在一个基于视频扩散架构的框架内,类似于Stable Video Diffusion。

🔍 该框架包含两个关键的神经渲染器:神经逆渲染器和神经前向渲染器。神经逆渲染器分析输入视频,估计场景的内在属性,生成描述场景几何和材质的G-buffer;神经前向渲染器则基于G-buffer和光照信息,合成逼真的视频。

📊 为了弥合完美物理学与不完美现实之间的差距,研究人员采用了创新的数据策略。他们构建了一个包含15万个视频的高质量合成数据集,并利用自动标注技术,为10,510个真实世界视频生成G-buffer,从而训练模型。

✨ DiffusionRenderer在多种任务中都展现出卓越性能,包括前向渲染、逆渲染和重新打光。它能够生成更准确的反射和高保真光照效果,为用户提供了动态重新打光、直观的材质编辑和无缝的对象插入等功能。

AI-powered video generation is improving at a breathtaking pace. In a short time, we’ve gone from blurry, incoherent clips to generated videos with stunning realism. Yet, for all this progress, a critical capability has been missing: control and Edits

While generating a beautiful video is one thing, the ability to professionally and realistically edit it—to change the lighting from day to night, swap an object’s material from wood to metal, or seamlessly insert a new element into the scene—has remained a formidable, largely unsolved problem. This gap has been the key barrier preventing AI from becoming a truly foundational tool for filmmakers, designers, and creators.

Until the introduction of DiffusionRenderer!!

In a groundbreaking new paper, researchers at NVIDIA, University of Toronto, Vector Institute and the University of Illinois Urbana-Champaign have unveiled a framework that directly tackles this challenge. DiffusionRenderer represents a revolutionary leap forward, moving beyond mere generation to offer a unified solution for understanding and manipulating 3D scenes from a single video. It effectively bridges the gap between generation and editing, unlocking the true creative potential of AI-driven content.

The Old Way vs. The New Way: A Paradigm Shift

For decades, photorealism has been anchored in PBR, a methodology that meticulously simulates the flow of light. While it produces stunning results, it’s a fragile system. PBR is critically dependent on having a perfect digital blueprint of a scene—precise 3D geometry, detailed material textures, and accurate lighting maps. The process of capturing this blueprint from the real world, known as inverse rendering, is notoriously difficult and error-prone. Even small imperfections in this data can cause catastrophic failures in the final render, a key bottleneck that has limited PBR’s use outside of controlled studio environments.

Previous neural rendering techniques like NeRFs, while revolutionary for creating static views, hit a wall when it came to editing. They “bake” lighting and materials into the scene, making post-capture modifications nearly impossible.

DiffusionRenderer treats the “what” (the scene’s properties) and the “how” (the rendering) in one unified framework built on the same powerful video diffusion architecture that underpins models like Stable Video Diffusion.

This method uses two neural renderers to process video:

DiffusionRenderer Inverse rendering example above.  The method predicts finer details in thin structures and accurate metallic and roughness channels (top). The method also generalizes impressively to outdoor scenes (bottom row).
DiffusionRenderer forward rendering method generates high-quality inter-reflections (top) and shadows (bottom), producing more accurate results than the neural baselines. Path Traced GT is the ground truth.

This self-correcting synergy is the core of the breakthrough. The system is designed for the messiness of the real world, where perfect data is a myth.

The Secret Sauce: A Novel Data Strategy to Bridge the Reality Gap

A smart model is nothing without smart data. The researchers behind DiffusionRenderer devised an ingenious two-pronged data strategy to teach their model the nuances of both perfect physics and imperfect reality.

    A Massive Synthetic Universe: First, they built a vast, high-quality synthetic dataset of 150,000 videos. Using thousands of 3D objects, PBR materials, and HDR light maps, they created complex scenes and rendered them with a perfect path-tracing engine. This gave the inverse rendering model a flawless “textbook” to learn from, providing it with perfect ground-truth data.
    Auto-Labeling the Real World: The team found that the inverse renderer, trained only on synthetic data, was surprisingly good at generalizing to real videos. They unleashed it on a massive dataset of 10,510 real-world videos (DL3DV10k). The model automatically generated G-buffer labels for this real-world footage. This created a colossal, 150,000-sample dataset of real scenes with corresponding—albeit imperfect—intrinsic property maps.

By co-training the forward renderer on both the perfect synthetic data and the auto-labeled real-world data, the model learned to bridge the critical “domain gap.” It learned the rules from the synthetic world and the look and feel of the real world. To handle the inevitable inaccuracies in the auto-labeled data, the team incorporated a LoRA (Low-Rank Adaptation) module, a clever technique that allows the model to adapt to the noisier real data without compromising the knowledge gained from the pristine synthetic set.

State-of-the-Art Performance

The results speak for themselves. In rigorous head-to-head comparisons against both classic and neural state-of-the-art methods, DiffusionRenderer consistently came out on top across all evaluated tasks by a wide margin:

For Forward rendering, the results are amazing compared to ground truth (Path Traced GT is the ground truth.).
Above is a relighting evaluation against other methods

What You Can Do With DiffusionRenderer: powerful editing!

This research unlocks a suite of practical and powerful editing applications that operate from a single, everyday video. The workflow is simple: the model first performs inverse rendering to understand the scene, the user edits the properties, and the model then performs forward rendering to create a new photorealistic video.

A New Foundation for Graphics

DiffusionRenderer represents a definitive breakthrough. By holistically solving inverse and forward rendering within a single, robust, data-driven framework, it tears down the long-standing barriers of traditional PBR. It democratizes photorealistic rendering, moving it from the exclusive domain of VFX experts with powerful hardware to a more accessible tool for creators, designers, and AR/VR developers.

In a recent update, the authors further improve video de-lighting and re-lighting by leveraging NVIDIA Cosmos and enhanced data curation.

This demonstrates a promising scaling trend: as the underlying video diffusion model grows more powerful, the output quality improves, yielding sharper, more accurate results.

These improvements make the technology even more compelling.

The new model is released under Apache 2.0 and the NVIDIA Open Model License and is available here

Sources:


Thanks to the NVIDIA team for the thought leadership/ Resources for this article. NVIDIA team has supported and sponsored this content/article.

The post NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DiffusionRenderer AI视频编辑 NVIDIA 3D场景编辑
相关文章