MarkTechPost@AI 01月20日
GameFactory: Leveraging Pre-trained Video Models for Creating New Game
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GameFactory是用于解决游戏视频生成中场景泛化问题的框架,利用预训练视频扩散模型,通过多阶段训练策略,实现新游戏的创建。它在不同控制机制的效果评估中显示出重要见解,但发展完全成熟的生成式游戏引擎仍面临诸多挑战。

🎮 GameFactory是解决游戏视频生成中场景泛化问题的框架,利用预训练模型创建新游戏。

💻 多阶段训练策略包括利用LoRA适应目标游戏域、训练动作控制模块、去除LoRA权重等。

📊 评估显示不同控制机制在不同场景下的效果,如交叉注意力对离散控制信号表现较好。

🚧 发展完全成熟的生成式游戏引擎仍存在多种挑战,如多样化关卡创建等。

Video diffusion models have emerged as powerful tools for video generation and physics simulation, showing promise in developing game engines. These generative game engines function as video generation models with action controllability, allowing them to respond to user inputs like keyboard and mouse interactions. A critical challenge in this field is scene generalization – the ability to create new game scenes beyond existing ones. While collecting large-scale action-annotated video datasets would be the most straightforward approach to achieve this, such annotation is prohibitively expensive and impractical for open-domain scenarios. This limitation creates a barrier to developing versatile game engines that can generate diverse and novel game environments.

Recent approaches in video generation and game physics have explored various methodologies, with video diffusion models emerging as a significant advancement. These models have evolved from U-Net to Transformer-based architectures, enabling the generation of more realistic and longer-duration videos. Further, methods like Direct-a-Video, offer basic camera control, while MotionCtrl and CameraCtrl provide more complex camera pose manipulation. In the gaming domain, various projects like DIAMOND, GameNGen, and PlayGen have attempted game-specific implementations but suffer from overfitting to specific games and datasets, showing limited scene generalization capabilities.

Researchers from The University of Hong Kong and Kuaishou Technology have proposed GameFactory, a groundbreaking framework designed to address scene generalization in-game video generation. The framework utilizes pre-trained video diffusion models trained on open-domain video data to enable the creation of entirely new and diverse games. Researchers also developed a multi-phase training strategy that separates game-style learning from action control to overcome the domain gap between open-domain priors and limited game datasets. They have also released GF-Minecraft, a high-quality action-annotated video dataset, and expanded their framework to support autoregressive action-controllable game video generation, enabling the production of unlimited-length interactive game videos.

GameFactory employs a complex multi-phase training strategy, to achieve effective scene generalization and action control. The process begins with a pre-trained video diffusion model and proceeds through three phases. In Phase #1, the model uses LoRA adaptation to specialize in the target game domain while preserving most original parameters. Phase #2 focuses exclusively on training the action control module, with pre-trained parameters and LoRA frozen. This separation prevents style-control entanglement and enables the model to focus purely on learning action controls. During Phase #3, the LoRA weights are removed while retaining the action control module parameters allowing the system to generate controlled game videos across diverse open-domain scenarios without being tied to specific game styles.

Evaluation of GameFactory’s performance reveals significant insights into different control mechanisms and their effectiveness. Cross-attention shows superior performance over concatenation for discrete control signals like keyboard inputs, as measured by Flow-MSE metrics. However, concatenation proves more effective for continuous mouse movement signals, likely because cross-attention similarity computation tends to diminish the impact of the control signal’s magnitude. Different methods show comparable performance due to the decoupled style of learning in Phase #1 in terms of style consistency, measured by CLIPSim and FID metrics. The system masters basic atomic actions and complex combined movements across diverse game scenarios.

In this paper, researchers introduced GameFactory which represents a significant advancement in generative game engines, addressing the crucial challenge of scene generalization in-game video generation. The framework shows the feasibility of creating new games through generative interactive videos by effectively utilizing open-domain video data and implementing a novel multi-phase training strategy. While this achievement marks an important milestone, several challenges remain in developing generative game engines, that are fully capable. This includes diverse levels creation, implementation of gameplay mechanics, development of player feedback systems, in-game object manipulation, and real-time game generation. GameFactory establishes a promising foundation for future research in this evolving field.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

Recommend Open-Source Platform: Parlant is a framework that transforms how AI agents make decisions in customer-facing scenarios. (Promoted)

The post GameFactory: Leveraging Pre-trained Video Models for Creating New Game appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GameFactory 游戏视频生成 场景泛化 多阶段训练
相关文章