MarkTechPost@AI 2024年11月21日
Meet The Matrix: A New AI Approach to Infinite-Length and Real-Time Video Generation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

The Matrix 是一款由阿里巴巴、香港大学和滑铁卢大学联合开发的基础世界模型,能够生成无限长度的 720p 视频流,并实现实时帧级控制。它利用来自 AAA 游戏和真实世界视频的监督和无监督学习,实现了在游戏和现实环境中无缝导航,例如模拟宝马 X3 在办公室行驶。该模型基于视频扩散变换器 (DiT) 和“移位窗口去噪过程模型”(Swin-DPM),能够生成流畅的高分辨率视频内容,并通过交互模块实现用户输入的动态控制,最高可达 16 帧/秒。The Matrix 的开源特性也使其成为一个可扩展且灵活的工具,可应用于游戏、自动驾驶模拟、虚拟现实等领域。

🤔**基于视频扩散变换器 (DiT) 模型:** The Matrix 利用视频扩散变换器 (DiT) 模型持续生成流畅的高分辨率视频内容,并通过“移位窗口去噪过程模型”(Swin-DPM)有效管理长视频序列的注意力机制,从而实现无限长度的视频生成。

🎮**利用游戏和真实世界数据进行训练:** 该模型利用来自 AAA 游戏(如 Forza Horizon 5 和 Cyberpunk 2077)和真实世界视频的监督和无监督学习,使其能够在游戏和现实环境中无缝导航,并具备良好的泛化能力,例如模拟汽车在训练数据中不存在的室内环境行驶。

🚀**实现实时交互和帧级控制:** The Matrix 集成了交互模块,允许用户通过键盘命令等方式动态影响生成的视频内容,并以最高 16 帧/秒的速度进行实时渲染,实现了帧级精度的运动控制。

📊**取得优异的性能指标:** The Matrix 在某些设置下实现了约 28.98 的峰值信噪比 (Move-PSNR),并且在使用流一致性模型 (SCM) 优化后,实时渲染速度达到 8-16 帧/秒。

💡**开源特性促进进一步发展:** The Matrix 的开源特性允许开发者进行进一步的实验和调整,鼓励持续创新,并推动其在游戏、训练模拟和虚拟体验等领域的应用。

Generating high-quality, real-time video simulations poses significant challenges, especially when aiming for extended lengths without compromising quality. Traditionally, world models for video generation have faced limitations due to high computational costs, short video duration, and lack of real-time interactivity. The use of manually configured assets, as seen in AAA game development, can be costly, making it unsustainable for continuous video production at scale. Many existing models, such as Sora or Genie, struggle to generate realistic, high-resolution simulations or perform in real time, limiting their practical use. These barriers call for a more scalable and realistic approach to generating high-fidelity video simulations with interactive capabilities.

Meet The Matrix

The Matrix is a foundation world model for generating infinite-length videos with real-time, frame-level control. Developed by a collaborative team from Alibaba, the University of Hong Kong, and the University of Waterloo, The Matrix addresses many of the challenges traditional models face. It can produce infinitely long 720p video streams that replicate real-world settings, such as urban landscapes and natural terrains, while maintaining real-time interactivity at frame-level precision. Unlike traditional simulators requiring extensive manual configuration, The Matrix leverages supervised and unsupervised learning from data sources like AAA games (e.g., Forza Horizon 5 and Cyberpunk 2077) and real-world video footage. This approach enables the model to navigate both gaming and real-world environments seamlessly, for example, simulating a BMW X3 driving through an office setting, which is not available in the training data.

Technical Details

The Matrix is built upon a video Diffusion Transformer (DiT) model, which allows it to produce smooth, high-resolution video content continuously. A key innovation that makes this possible is the “Shift-Window Denoise Process Model” (Swin-DPM), which enables infinite-length video generation by effectively managing the attention mechanisms required for long video sequences. This process works in tandem with the Interactive Module, which incorporates user inputs (such as keyboard commands) to dynamically influence the generated video content. The result is a model that delivers a high-quality simulation with real-time control, operating at speeds of up to 16 frames per second (FPS).

The Matrix can generalize from game environments to real-world contexts without additional training, making it a versatile tool for creating interactive simulations, potentially useful for video games, autonomous vehicle simulation, virtual reality experiences, and more. Additionally, the open-source nature of The Matrix allows for further experimentation and adaptation by developers, encouraging ongoing innovation.

Importance and Results

The importance of The Matrix lies in its ability to bridge the gap between simulated and real-world environments, making it a valuable tool in world modeling. The scalability offered by The Matrix reduces the cost of generating interactive simulations, eliminating the need for handcrafted environments. The results reported in the paper show that The Matrix achieves frame-level precision in movement control across multiple scenes, including those in Cyberpunk 2077 and Forza Horizon 5. The model demonstrates strong generalization, enabling precise control even in out-of-distribution settings such as driving indoors, which was not part of the training data.

In terms of visual quality and control accuracy, The Matrix achieved a high Peak Signal-to-Noise Ratio (Move-PSNR) of around 28.98 in certain settings, with real-time rendering speeds of 8-16 FPS after optimizing with the Stream Consistency Model (SCM). This makes The Matrix an effective world simulator that integrates infinite video generation with high-quality rendering and real-time capabilities. While some sacrifices in visual quality are made to achieve real-time speeds, the overall quality still surpasses that of previous models, offering a realistic and engaging simulation.

Conclusion

The Matrix represents a significant advancement in video generation technology, providing a scalable solution for producing infinite-length video streams with real-time, interactive capabilities. By leveraging advanced diffusion techniques and an efficient training pipeline, The Matrix achieves a level of quality and generalizability that previous models could not. This foundational model not only brings us closer to realizing immersive virtual environments but also demonstrates the potential for applications in gaming, training simulations, and virtual experiences. With its combination of scalability, real-time control, and open-source availability, The Matrix sets a new standard for world modeling in the era of AI-driven simulations.


Check out the Paper and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI VIRTUAL CONFERENCE] SmallCon: Free Virtual GenAI Conference ft. Meta, Mistral, Salesforce, Harvey AI & more. Join us on Dec 11th for this free virtual event to learn what it takes to build big with small models from AI trailblazers like Meta, Mistral AI, Salesforce, Harvey AI, Upstage, Nubank, Nvidia, Hugging Face, and more.

The post Meet The Matrix: A New AI Approach to Infinite-Length and Real-Time Video Generation appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

The Matrix AI视频生成 实时模拟 无限长度视频 世界模型
相关文章