TechCrunch News 2024年12月04日
DeepMind’s Genie 2 can generate interactive worlds that look like video games
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepMind推出可生成多种可玩3D世界的Genie 2模型。它能从单图像和文本描述生成互动实时场景,训练基于视频,可模拟多种元素。该模型存在一些争议,DeepMind将其定位为研究和创意工具。

🎮DeepMind推出Genie 2,能从单图像和文本生成场景

💻模型可模拟多种元素,训练数据含热门游戏

🔍存在数据来源等争议,被定位为研究创意工具

🌟有望成为发展未来AI代理的关键组件

DeepMind, Google’s AI research org, has unveiled a model that can generate an “endless” variety of playable 3D worlds.

Called Genie 2, the model — the successor to DeepMind’s Genie, which was released earlier this year — can generate an interactive, real-time scene from a single image and text description (e.g. “A cute humanoid robot in the woods”). In this way, it’s similar to models under development by Fei-Fei Li’s company, World Labs, and Israeli startup Decart.

DeepMind claims that Genie 2 can generate a “vast diversity of rich 3D worlds,” including worlds in which users can take actions like jumping and swimming by using a mouse or keyboard. Trained on videos, the model’s able to simulate object interactions, animations, lighting, physics, reflections, and the behavior of “NPCs.”

Image Credits:DeepMind

Many of Genie 2’s simulations look like AAA video games — and the reason could well be that the model’s training data contains playthroughs of popular games. But DeepMind, like many AI labs, wouldn’t reveal many details about its data sourcing methods, likely for competitive reasons.

One wonders about the IP implications. DeepMind — being a Google subsidiary — has unfettered access to YouTube, and Google has previously implied that its ToS gives it permission to use YouTube videos for model training. But is Genie 2 basically creating unauthorized copies of the games it “watched”? That’s for the courts to decide, I suppose.

Genie 2 can generate consistent worlds with different perspectives, like first-person and isometric views, for up to a minute, with the majority lasting 10-20 seconds.

“Genie 2 responds intelligently to actions taken by pressing keys on a keyboard, identifying the character and moving it correctly,” DeepMind explained in a blog post. “For example, our model [can] figure out that arrow keys should move a robot and not trees or clouds.”

Image Credits:DeepMind

Most models like Genie 2 — world models, if you will — can simulate games and 3D environments, but with artifacting, consistency, and hallucinatory issues. For example, Decart’s Minecraft simulator, Oasis, has a low resolution and quickly “forgets” the layout of levels.

Genie 2, however, can remember parts of a simulated scene that aren’t in view and render them accurately when they become visible again, DeepMind claims. (World Labs’ models can do this too.)

Now, games created with Genie 2 wouldn’t be all that fun, really. Having your progress erased every minute would drive anyone up the wall. So DeepMind’s positioning the model as more of a research and creative tool — a tool for prototyping “interactive experiences” and evaluating AI agents.

“Thanks to Genie 2’s out-of-distribution generalization capabilities, concept art and drawings can be turned into fully interactive environments,” DeepMind wrote. “And by using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can generate evaluation tasks that agents have not seen during training.”

Image Credits:DeepMind

DeepMind says that while Genie 2 is in the early stages, the lab believes it’ll be a key component in developing AI agents of the future.

Google has poured increasing resources into world models, which promise to be the next big thing in AI. In October, DeepMind hired Tim Brooks, who was heading development on OpenAI’s Sora video generator, to work on video generation technologies and world simulators.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepMind Genie 2 3D世界 AI研究
相关文章