MarkTechPost@AI 02月05日
Meta AI Introduces VideoJAM: A Novel AI Framework that Enhances Motion Coherence in AI-Generated Videos
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta AI 推出 VideoJAM,旨在增强视频生成模型中的运动表示。VideoJAM 通过鼓励联合外观-运动表示,改进了生成运动的一致性。与将运动视为次要考虑因素的传统方法不同,VideoJAM 将其直接整合到训练和推理过程中。该框架只需进行最小的修改即可整合到现有模型中,提供了一种有效的方式来增强运动质量,而无需更改训练数据。VideoJAM 通过联合外观-运动表示和内部引导机制,使模型能够生成具有更高的时间一致性和真实感的视频。

💡VideoJAM 是一种用于在视频生成模型中引入更强的运动表示的框架,旨在通过联合外观-运动表示来提高生成运动的一致性。

⚙️VideoJAM 由两个主要组件组成:训练阶段和推理阶段(内部引导机制)。在训练阶段,输入视频及其对应的运动表示被噪声处理并嵌入到联合潜在表示中。在推理阶段,VideoJAM 引入内部引导,模型利用其自身不断发展的运动预测来引导视频生成。

🚀VideoJAM 的评估表明,在不同类型的视频中,运动连贯性得到了显著改善。与 Sora 和 Kling 等已建立的模型相比,VideoJAM 减少了伪影,例如帧扭曲和不自然的对象变形。此外,VideoJAM 在自动评估和人工评估中始终获得更高的运动连贯性分数。

🛠️VideoJAM 可以有效地与各种预训练视频模型集成,展示了其适应性,而无需进行广泛的再训练。该框架仅使用两个额外的线性层来提高视频质量,使其成为一种轻量级且实用的解决方案。

Despite recent advancements, generative video models still struggle to represent motion realistically. Many existing models focus primarily on pixel-level reconstruction, often leading to inconsistencies in motion coherence. These shortcomings manifest as unrealistic physics, missing frames, or distortions in complex motion sequences. For example, models may struggle with depicting rotational movements or dynamic actions like gymnastics and object interactions. Addressing these issues is essential for improving the realism of AI-generated videos, particularly as their applications expand into creative and professional domains.

Meta AI presents VideoJAM, a framework designed to introduce a stronger motion representation in video generation models. By encouraging a joint appearance-motion representation, VideoJAM improves the consistency of generated motion. Unlike conventional approaches that treat motion as a secondary consideration, VideoJAM integrates it directly into both the training and inference processes. This framework can be incorporated into existing models with minimal modifications, offering an efficient way to enhance motion quality without altering training data.

Technical Approach and Benefits

VideoJAM consists of two primary components:

    Training Phase: An input video (x1) and its corresponding motion representation (d1) are both subjected to noise and embedded into a single joint latent representation using a linear layer (Win+). A diffusion model then processes this representation, and two linear projection layers predict both appearance and motion components from it (Wout+). This structured approach helps balance appearance fidelity with motion coherence, mitigating the common trade-off found in previous models.Inference Phase (Inner-Guidance Mechanism): During inference, VideoJAM introduces Inner-Guidance, where the model utilizes its own evolving motion predictions to guide video generation. Unlike conventional techniques that rely on fixed external signals, Inner-Guidance allows the model to adjust its motion representation dynamically, leading to smoother and more natural transitions between frames.

Insights

Evaluations of VideoJAM indicate notable improvements in motion coherence across different types of videos. Key findings include:

Conclusion

VideoJAM provides a structured approach to improving motion coherence in AI-generated videos by integrating motion as a key component rather than an afterthought. By leveraging a joint appearance-motion representation and Inner-Guidance mechanism, the framework enables models to generate videos with greater temporal consistency and realism. With minimal architectural modifications required, VideoJAM offers a practical means to refine motion quality in generative video models, making them more reliable for a range of applications.


Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

Marktechpost is inviting AI Companies/Startups/Groups to partner for its upcoming AI Magazines on ‘Open Source AI in Production’ and ‘Agentic AI’.

The post Meta AI Introduces VideoJAM: A Novel AI Framework that Enhances Motion Coherence in AI-Generated Videos appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

VideoJAM AI视频生成 运动连贯性 Meta AI
相关文章