MarkTechPost@AI 2024年09月18日
DreamHOI: A Novel AI Approach for Realistic 3D Human-Object Interaction Generation Using Textual Descriptions and Diffusion Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DreamHOI 是一种新颖的 AI 方法,使用文本描述生成逼真的 3D 人机交互。该方法结合了神经辐射场 (NeRF) 和骨骼驱动的网格关节,并利用分数蒸馏采样从预训练的文本到图像扩散模型中获取梯度,以优化姿势参数。DreamHOI 能够在保持角色身份的同时,有效地生成各种对象和交互,克服了现有方法在生成交互时难以保持网格身份和结构的局限性。

👨‍💻 DreamHOI 利用了文本到图像扩散模型,能够根据文本描述生成逼真的 3D 人机交互场景。该方法结合了神经辐射场 (NeRF) 和骨骼驱动的网格关节,并利用分数蒸馏采样从预训练的文本到图像扩散模型中获取梯度,以优化姿势参数。这种双重隐式-显式表示方法能够有效地生成各种对象和交互,并保持角色身份。

🔄 DreamHOI 通过两阶段优化过程,包括 5000 步 NeRF 细化,生成高质量的结果。该方法还引入了正则化器以保持适当的模型大小和对齐,并使用回归器促进 NeRF 和蒙皮网格表示之间的转换。

🚀 DreamHOI 在生成 3D 人机交互方面表现出色,在各种场景中能够生成高质量的交互,并超越了其他基线方法。该方法有望应用于电影和游戏制作等领域,简化逼真虚拟环境的创建,并为各种交互场景提供更便捷的生成方式。

📊 实验结果表明,DreamHOI 在生成 3D 人机交互方面表现出色,在各种场景中能够生成高质量的交互,并超越了其他基线方法。消融研究证实了每个组件的重要性,并揭示了 DreamHOI 在生成逼真 3D 人机交互方面的巨大潜力。

💡 DreamHOI 的研究成果为生成逼真的 3D 人机交互提供了新的思路,并为电影、游戏和虚拟现实等领域的发展提供了新的可能性。

Early attempts in 3D generation focused on single-view reconstruction using category-specific models. Recent advancements utilize pre-trained image and video generators, particularly diffusion models, to enable open-domain generation. Fine-tuning on multi-view datasets improved results, but challenges persisted in generating complex compositions and interactions. Efforts to enhance compositionality in image generative models faced difficulties in transferring techniques to 3D generation. Some methods extended distillation approaches to compositional 3D generation, optimizing individual objects and spatial relationships while adhering to physical constraints.

Human-object interaction synthesis has progressed with methods like InterFusion, which generates interactions based on textual prompts. However, limitations in controlling human and object identities persist. Many approaches struggle to preserve human mesh identity and structure during interaction generation. These challenges highlight the need for more effective techniques that allow greater user control and practical integration into virtual environment production pipelines. This paper builds upon previous efforts to address these limitations and enhance the generation of human-object interactions in 3D environments.

Researchers from the University of Oxford and Carnegie Mellon University introduced a zero-shot method for synthesizing 3D human-object interactions using textual descriptions. The approach leverages text-to-image diffusion models to address challenges arising from diverse object geometries and limited datasets. It optimizes human mesh articulation using Score Distillation Sampling gradients from these models. The method employs a dual implicit-explicit representation, combining neural radiance fields with skeleton-driven mesh articulation to preserve character identity. This innovative approach bypasses extensive data collection, enabling realistic HOI generation for a wide range of objects and interactions, thereby advancing the field of 3D interaction synthesis.

DreamHOI employs a dual implicit-explicit representation, combining neural radiance fields (NeRFs) with skeleton-driven mesh articulation. This approach optimizes skinned human mesh articulation while preserving character identity. The method utilizes Score Distillation Sampling to obtain gradients from pre-trained text-to-image diffusion models, guiding the optimization process. The optimization alternates between implicit and explicit forms, refining mesh articulation parameters to align with textual descriptions. Rendering the skinned mesh alongside the object mesh allows for direct optimization of explicit pose parameters, enhancing efficiency due to the reduced number of parameters.

Extensive experimentation validates DreamHOI’s effectiveness. Ablation studies assess the impact of various components, including regularizers and rendering techniques. Qualitative and quantitative evaluations demonstrate the model’s performance compared to baselines. Diverse prompt testing showcases the method’s versatility in generating high-quality interactions across different scenarios. The implementation of a guidance mixture technique further enhances optimization coherence. This comprehensive methodology and rigorous testing establish DreamHOI as a robust approach for generating realistic and contextually appropriate human-object interactions in 3D environments.

DreamHOI excels in generating 3D human-object interactions from textual prompts, outperforming baselines with higher CLIP similarity scores. Its dual implicit-explicit representation combines NeRFs and skeleton-driven mesh articulation, enabling flexible pose optimization while preserving character identity. The two-stage optimization process, including 5000 steps of NeRF refinement, contributes to high-quality results. Regularizers play a crucial role in maintaining proper model size and alignment. A regressor facilitates transitions between NeRF and skinned mesh representations. DreamHOI overcomes the limitations of methods like DreamFusion in maintaining mesh identity and structure. This approach shows promise for applications in film and game production, simplifying the creation of realistic virtual environments with interacting humans.

In conclusion, DreamHOI introduces a novel approach for generating realistic 3D human-object interactions using textual prompts. The method employs a dual implicit-explicit representation, combining NeRFs with explicit pose parameters of skinned meshes. This approach, along with Score Distillation Sampling, optimizes pose parameters effectively. Experimental results demonstrate DreamHOI’s superior performance compared to baseline methods, with ablation studies confirming the importance of each component. The paper addresses challenges in direct optimization of pose parameters and highlights DreamHOI’s potential to simplify virtual environment creation. This advancement opens up new possibilities for applications in the entertainment industry and beyond.


Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

FREE AI WEBINAR: ‘SAM 2 for Video: How to Fine-tune On Your Data’ (Wed, Sep 25, 4:00 AM – 4:45 AM EST)

The post DreamHOI: A Novel AI Approach for Realistic 3D Human-Object Interaction Generation Using Textual Descriptions and Diffusion Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DreamHOI 3D 人机交互 文本描述 扩散模型 神经辐射场
相关文章