MIT News - Machine learning 2024年12月04日
A new way to create realistic 3D shapes using generative AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

生成逼真的3D模型一直是虚拟现实、电影制作和工程设计等应用中的挑战。虽然生成式AI模型可以从文本提示生成逼真的2D图像,但它们无法直接生成3D形状。MIT研究人员通过改进一种名为得分蒸馏的技术,解决了3D模型生成模糊或卡通化的问题。他们通过识别算法中导致低质量3D模型的根本原因,并使用近似方法来解决一个复杂的公式,从而实现了生成高质量的3D形状,其质量与最佳的2D图像生成模型相当或更好。这项研究为未来3D模型生成技术的改进提供了新的方向,并有可能成为设计师的协同工具,简化3D形状的创建过程。

🤔**得分蒸馏技术用于生成3D模型,但输出结果常出现模糊或卡通化的问题。**MIT研究人员通过分析得分蒸馏技术,发现其在生成3D形状时存在一个关键公式与2D扩散模型不匹配的问题,导致了低质量的3D模型。

💡**研究人员通过近似方法解决了一个复杂的公式,改善了3D模型质量。**传统的得分蒸馏技术使用随机采样来解决一个复杂的公式,导致噪声增加,影响了3D模型的清晰度。MIT研究人员使用近似技术推断缺失项,从而生成清晰、逼真的3D形状。

🚀**该方法无需额外训练或复杂的后处理即可生成高质量的3D形状。**与其他需要重新训练或微调生成式AI模型的方法相比,MIT研究人员的方法能够在不增加训练成本或复杂后处理的情况下,生成与这些方法质量相当或更好的3D形状。

💻**该方法可以作为设计师的协同工具,简化3D形状的创建过程。**通过改进3D模型生成技术,研究人员希望能够为设计师提供一个更便捷的工具,帮助他们更容易地创建逼真的3D形状,从而提高工作效率和设计质量。

⚠️**该方法仍存在局限性,例如容易产生幻觉或其他错误。**由于该方法依赖于预训练的扩散模型,因此它继承了该模型的偏差和缺点,容易产生幻觉或其他错误。未来需要进一步改进底层扩散模型来提高生成质量。

Creating realistic 3D models for applications like virtual reality, filmmaking, and engineering design can be a cumbersome process requiring lots of manual trial and error.

While generative artificial intelligence models for images can streamline artistic processes by enabling creators to produce lifelike 2D images from text prompts, these models are not designed to generate 3D shapes. To bridge the gap, a recently developed technique called Score Distillation leverages 2D image generation models to create 3D shapes, but its output often ends up blurry or cartoonish.

MIT researchers explored the relationships and differences between the algorithms used to generate 2D images and 3D shapes, identifying the root cause of lower-quality 3D models. From there, they crafted a simple fix to Score Distillation, which enables the generation of sharp, high-quality 3D shapes that are closer in quality to the best model-generated 2D images.
 


Some other methods try to fix this problem by retraining or fine-tuning the generative AI model, which can be expensive and time-consuming.

By contrast, the MIT researchers’ technique achieves 3D shape quality on par with or better than these approaches without additional training or complex postprocessing.

Moreover, by identifying the cause of the problem, the researchers have improved mathematical understanding of Score Distillation and related techniques, enabling future work to further improve performance.

“Now we know where we should be heading, which allows us to find more efficient solutions that are faster and higher-quality,” says Artem Lukoianov, an electrical engineering and computer science (EECS) graduate student who is lead author of a paper on this technique. “In the long run, our work can help facilitate the process to be a co-pilot for designers, making it easier to create more realistic 3D shapes.”

Lukoianov’s co-authors are Haitz Sáez de Ocáriz Borde, a graduate student at Oxford University; Kristjan Greenewald, a research scientist in the MIT-IBM Watson AI Lab; Vitor Campagnolo Guizilini, a scientist at the Toyota Research Institute; Timur Bagautdinov, a research scientist at Meta; and senior authors Vincent Sitzmann, an assistant professor of EECS at MIT who leads the Scene Representation Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and Justin Solomon, an associate professor of EECS and leader of the CSAIL Geometric Data Processing Group. The research will be presented at the Conference on Neural Information Processing Systems.

From 2D images to 3D shapes

Diffusion models, such as DALL-E, are a type of generative AI model that can produce lifelike images from random noise. To train these models, researchers add noise to images and then teach the model to reverse the process and remove the noise. The models use this learned “denoising” process to create images based on a user’s text prompts.

But diffusion models underperform at directly generating realistic 3D shapes because there are not enough 3D data to train them. To get around this problem, researchers developed a technique called Score Distillation Sampling (SDS) in 2022 that uses a pretrained diffusion model to combine 2D images into a 3D representation.

The technique involves starting with a random 3D representation, rendering a 2D view of a desired object from a random camera angle, adding noise to that image, denoising it with a diffusion model, then optimizing the random 3D representation so it matches the denoised image. These steps are repeated until the desired 3D object is generated.

However, 3D shapes produced this way tend to look blurry or oversaturated.

“This has been a bottleneck for a while. We know the underlying model is capable of doing better, but people didn’t know why this is happening with 3D shapes,” Lukoianov says.

The MIT researchers explored the steps of SDS and identified a mismatch between a formula that forms a key part of the process and its counterpart in 2D diffusion models. The formula tells the model how to update the random representation by adding and removing noise, one step at a time, to make it look more like the desired image.

Since part of this formula involves an equation that is too complex to be solved efficiently, SDS replaces it with randomly sampled noise at each step. The MIT researchers found that this noise leads to blurry or cartoonish 3D shapes.

An approximate answer

Instead of trying to solve this cumbersome formula precisely, the researchers tested approximation techniques until they identified the best one. Rather than randomly sampling the noise term, their approximation technique infers the missing term from the current 3D shape rendering.

“By doing this, as the analysis in the paper predicts, it generates 3D shapes that look sharp and realistic,” he says.

In addition, the researchers increased the resolution of the image rendering and adjusted some model parameters to further boost 3D shape quality.

In the end, they were able to use an off-the-shelf, pretrained image diffusion model to create smooth, realistic-looking 3D shapes without the need for costly retraining. The 3D objects are similarly sharp to those produced using other methods that rely on ad hoc solutions.

“Trying to blindly experiment with different parameters, sometimes it works and sometimes it doesn’t, but you don’t know why. We know this is the equation we need to solve. Now, this allows us to think of more efficient ways to solve it,” he says.

Because their method relies on a pretrained diffusion model, it inherits the biases and shortcomings of that model, making it prone to hallucinations and other failures. Improving the underlying diffusion model would enhance their process.

In addition to studying the formula to see how they could solve it more effectively, the researchers are interested in exploring how these insights could improve image editing techniques.

This work is funded, in part, by the Toyota Research Institute, the U.S. National Science Foundation, the Singapore Defense Science and Technology Agency, the U.S. Intelligence Advanced Research Projects Activity, the Amazon Science Hub, IBM, the U.S. Army Research Office, the CSAIL Future of Data program, the Wistron Corporation, and the MIT-IBM Watson AI Laboratory.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

3D模型生成 生成式AI 得分蒸馏 扩散模型 MIT
相关文章