MarkTechPost@AI 2024年12月05日
Google DeepMind Introduces Genie 2: An Autoregressive Latent Diffusion Model for Virtual World and Game Creation with Minimal Input
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind推出了Genie 2,一个多模态AI模型,旨在缩小创意与AI之间的差距,革新交互式内容创作,尤其在游戏开发和虚拟世界领域。Genie 2基于其前身Genie,实现了重大进步,能够根据简单的输入生成复杂、可玩的游戏环境。无论是文字描述、图像还是手绘草图,Genie 2都能将其转化为动态、沉浸式的游戏场景,让用户无需编程技能也能创建互动虚拟世界。它通过分析海量视频数据学习玩家与环境互动方式,生成可供用户探索和参与的虚拟空间,并能自主解读输入并将其转化为完整的可玩元素,无需详细指令。

🤔 **Genie 2利用多模态输入生成虚拟世界:**Genie 2可以将文字描述、图像甚至手绘草图转化为可玩的游戏环境,大幅降低了虚拟世界创作的门槛,让更多人能够参与其中。

🎬 **时空Transformer处理视频内容:**Genie 2采用时空Transformer模型,能够有效地处理视频内容,分析视频帧的空间和时间维度,预测视频序列中的动作,从而生成下一个可玩的视频帧,构建更具真实感和动态变化的虚拟世界。

🎮 **自主学习玩家行为并生成游戏元素:**Genie 2能够通过分析视频数据学习玩家在不同环境下的行为模式,例如行走、跳跃或与物体互动,并根据这些模式生成相应的交互元素,无需用户提供详细指令,就能构建具有互动性和响应性的虚拟世界。

🌐 **基于海量视频数据学习游戏规则:**Genie 2通过学习海量网络视频,特别是游戏视频,掌握了游戏环境的基本规则和动态,能够根据用户输入预测合适的反应,生成复杂且动态的游戏世界,而无需编写复杂的规则库。

📽️ **视频标记器和动态模型确保生成内容的连贯性:**Genie 2使用视频标记器将视频帧分解成更小的片段,并利用动态模型和潜在动作模型(LAM)预测下一个视频帧,确保虚拟世界的连贯性和逻辑性,从而生成更流畅、自然的互动体验。

Google DeepMind has introduced Genie 2, a multimodal AI model designed to reduce the gap between creativity and AI. Genie 2 is poised to redefine the future of interactive content creation, particularly in video game development and virtual worlds. Building upon the foundation of its predecessor, the original Genie, this new iteration demonstrates advancements, including its ability to generate complex, fully playable virtual environments from simple input. Genie 2 can transform these inputs into dynamic, immersive video game landscapes, whether written descriptions, images, or hand-drawn sketches.

Using its intuitive system, Google Genie 2 allows users to craft detailed, interactive virtual environments. No longer limited to those with programming skills, anyone can craft detailed, interactive virtual environments using Genie 2’s intuitive system. The AI tool analyzes vast datasets, including video content, to learn how players interact with their environment. This allows it to generate virtual spaces where users can actively participate and explore. What sets Genie 2 apart is its ability to autonomously interpret and transform input into fully functioning gameplay elements without the need for explicit instructions.

Spatiotemporal (ST) transformers are a unique form of transformer model that allows Genie 2 to process video content effectively. Unlike traditional transformers optimized for processing text, ST transformers can analyze video frames’ spatial and temporal components. This enables Genie 2 to predict what actions might happen in a video sequence, which is critical for generating the next playable frame in a video game. Essentially, the AI learns the underlying patterns in video content and how objects interact as time progresses, allowing it to simulate realistic, evolving virtual worlds. Through this sophisticated method, it can understand not only the individual frames of a video but also the transitions between them, enabling more fluid, lifelike virtual environments.

Google Genie 2 can learn latent actions from video content. This feature enables the AI to predict player actions in a game or virtual world without explicit instructions. 

For example, If a user provides a simple image or description of a space, Genie 2 can infer the most likely actions a player would take in that environment, such as walking, jumping, or interacting with objects.
This capability allows users to create personalized virtual spaces that respond naturally to player input. This feature is impressive because it mimics modern video games’ dynamic, interactive behavior, where the environment reacts to player choices and actions in real-time.

Another great feature of Genie 2 is its ability to create entirely new gameplay experiences based on relatively minimal input. This is accomplished through its training on a massive dataset of internet videos, particularly those showcasing gameplay. This training allows Genie 2 to learn gaming environments’ basic rules and dynamics. It then uses this knowledge to predict the appropriate responses to user inputs, generating complex, dynamic worlds without an extensive rulebook. This learning process from video content is integral to its success, as it empowers Genie 2 to be adaptable and capable of handling an infinite variety of virtual scenarios.

The core of Genie 2’s operation is using a video tokenizer, which reduces the complexity of video frames into smaller, more manageable chunks. These chunks, tokens, are easier for the AI to process and manipulate. Using these tokens, Genie 2 predicts the next frame of a video sequence by evaluating the actions within the video, effectively continuing the story or gameplay sequence. This ability to generate the next frame of a video on the fly is essential for creating immersive, playable environments, as it allows users to build games that evolve naturally over time.

Also, Genie 2 uses a dynamics model that plays a great role in maintaining the continuity and coherence of the generated video. The dynamics model uses the video tokens and inferred actions to generate the next frame, ensuring that the virtual world remains consistent and logical. This model helps predict what happens next in a game or virtual space based on the player’s actions and choices. This prediction capability makes the virtual worlds feel more responsive and interactive as the AI adapts to the player’s real-time decisions.

The system also includes a latent action model (LAM), which helps Genie 2 understand what happens between video frames. The LAM analyzes video sequences to infer the unspoken actions, such as a character moving or interacting with objects. This feature is important in video generation because it allows the AI to create more accurate and dynamic interactions between objects and characters within a virtual world.

In conclusion, Google Genie 2’s innovative approach to game and world creation is a game-changer for the industry. It enables users to create complex virtual environments with minimal effort and technical expertise, opening up new possibilities for professionals and amateurs. Game developers, for instance, can use Genie 2 to quickly prototype new worlds and gameplay experiences, saving valuable time and resources. At the same time, hobbyists and aspiring creators can explore their ideas without needing advanced programming skills.


Check out the Details here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 60k+ ML SubReddit.

[Must Attend Webinar]: ‘Transform proofs-of-concept into production-ready AI applications and agents’ (Promoted)

The post Google DeepMind Introduces Genie 2: An Autoregressive Latent Diffusion Model for Virtual World and Game Creation with Minimal Input appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Genie 2 AI 游戏开发 虚拟世界 多模态
相关文章