NVIDIA Research has developed an AI light switch for videos that can turn daytime scenes into nightscapes, transform sunny afternoons to cloudy days and tone down harsh fluorescent lighting into soft, natural illumination.
Called DiffusionRenderer, it’s a new technique for neural rendering — a process that uses AI to approximate how light behaves in the real world. It brings together two traditionally distinct processes — inverse rendering and forward rendering — in a unified neural rendering engine that outperforms state-of-the-art methods.
DiffusionRenderer provides a framework for video lighting control, editing and synthetic data augmentation, making it a powerful tool for creative industries and physical AI development.
Creators in advertising, film and game development could use applications based on DiffusionRenderer to add, remove and edit lighting in real-world or AI-generated videos. Physical AI developers could use it to augment synthetic datasets with a greater diversity of lighting conditions to train models for robotics and autonomous vehicles (AVs).
DiffusionRenderer is one of over 60 NVIDIA papers accepted to the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 11-15 in Nashville, Tennessee.
Creating AI That Delights
DiffusionRenderer tackles the challenge of de-lighting and relighting a scene from only 2D video data.
De-lighting is a process that takes an image and removes its lighting effects, so that only the underlying object geometry and material properties remain. Relighting does the opposite, adding or editing light in a scene while maintaining the realism of complex properties like object transparency and specularity — how a surface reflects light.
Classic, physically based rendering pipelines need 3D geometry data to calculate light in a scene for de-lighting and relighting. DiffusionRenderer instead uses AI to estimate properties including normals, metallicity and roughness from a single 2D video.
With these calculations, DiffusionRenderer can generate new shadows and reflections, change light sources, edit materials and insert new objects into a scene — all while maintaining realistic lighting conditions.
Using an application powered by DiffusionRenderer, AV developers could take a dataset of mostly daytime driving footage and randomize the lighting of every video clip to create more clips representing cloudy or rainy days, evenings with harsh lighting and shadows, and nighttime scenes. With this augmented data, developers can boost their development pipelines to train, test and validate AV models that are better equipped to handle challenging lighting conditions.
Creators who capture content for digital character creation or special effects could use DiffusionRenderer to power a tool for early ideation and mockups — enabling them to explore and iterate through various lighting options before moving to expensive, specialized light stage systems to capture production-quality footage.
Enhancing DiffusionRenderer With NVIDIA Cosmos
Since completing the original paper, the research team behind DiffusionRenderer has integrated their method with Cosmos Predict-1, a suite of world foundation models for generating realistic, physics-aware future world states.
By doing so, the researchers observed a scaling effect, where applying Cosmos Predict’s larger, more powerful video diffusion model boosted the quality of DiffusionRenderer’s de-lighting and relighting correspondingly — enabling sharper, more accurate and temporally consistent results.
Cosmos Predict is part of NVIDIA Cosmos, a platform of world foundation models, tokenizers, guardrails and an accelerated data processing and curation pipeline to accelerate synthetic data generation for physical AI development. Read about the new Cosmos Predict-2 model on the NVIDIA Technical Blog.
NVIDIA Research at CVPR
At CVPR, NVIDIA researchers are presenting dozens of papers on topics spanning automotive, healthcare, robotics and more. Three NVIDIA papers are nominated for this year’s Best Paper Award:
- FoundationStereo: This foundation model reconstructs 3D information from 2D images by matching pixels in stereo images. Trained on a dataset of over 1 million images, the model works out-of-the-box on real-world data, outperforming existing methods and generalizing across domains.Zero-Shot Monocular Scene Flow Estimation in the Wild: A collaboration between researchers at NVIDIA and Brown University, this paper introduces a generalizable model for predicting scene flow — the motion field of points in a 3D environment.Difix3D+: This paper, by researchers from the NVIDIA Spatial Intelligence Lab, introduces an image diffusion model that removes artifacts from novel viewpoints in reconstructed 3D scenes, enhancing the overall quality of 3D representations.
NVIDIA was also named an Autonomous Grand Challenge winner at CVPR, marking the second consecutive year NVIDIA topped the leaderboard in the end-to-end category — and the third consecutive year winning an Autonomous Grand Challenge award at the conference.
Learn more about NVIDIA Research, a global team of hundreds of scientists and engineers focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.