TechCrunch News 05月04日 00:46
Google’s Gemini has beaten Pokémon Blue (with a little help)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌的Gemini 2.5 Pro AI模型成功完成了《宝可梦:蓝》游戏,引发了广泛关注。一位独立软件工程师创建了“Gemini Plays Pokemon”直播,谷歌高管也对此表示支持。Anthropic的Claude AI模型也在《宝可梦:红》游戏中取得了进展。虽然两款AI模型都需要辅助才能进行游戏,但Gemini的成功被视为AI在游戏领域的一项重要突破。开发者强调,这并非对LLM游戏能力的基准测试,Gemini的成功离不开后期的优化和调整。

🎮谷歌的Gemini 2.5 Pro AI模型成功完成了《宝可梦:蓝》游戏,谷歌CEO Sundar Pichai在社交媒体上表达了喜悦之情。

🤖Anthropic的Claude AI模型也在《宝可梦:红》游戏中取得了进展,这表明AI在游戏领域的应用潜力巨大。

🛠️Gemini和Claude都需要“代理工具”的辅助,这些工具为模型提供游戏截图和额外信息,帮助它们做出决策并执行操作。

🧑‍💻开发者Joel Z承认,为了帮助Gemini完成游戏,进行了一些“开发干预”,但他强调这些干预旨在提高Gemini的整体决策和推理能力,而非作弊。

Google’s most expensive AI model seems to have crossed a major milestone: Beating a 29-year-old video game.

Last night, Google CEO Sundar Pichai posted triumphantly on X, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!”

To be clear, the Gemini Plays Pokemon livestream was created by (in his own words) “a 30 year old software engineer unaffiliated with Google” who goes by Joel Z. But Google executives have been cheering the effort on.

For example, Logan Kilpatrick, the product lead for Google AI Studio, posted last month that Gemini was “making great progress at completing Pokémon” and had “earned its 5th badge (next best model only has 3 so far, though with a different agent harness),” leading Pichai to joke, “We are working on API, Artificial Pokémon Intelligence:)”

Why Pokémon? Back in February, Anthropic highlighted progress that its Claude AI models were making in “Pokémon Red,” writing that Claude’s “extended thinking and agent training” gives it “a major boost” on “more unexpected” tasks, like playing a classic game. (“Pokémon Red” and “Blue” are different versions of a GameBoy title first released in 1996 and tied to the long-running Pokémon franchise). There’s even a Claude Plays Pokemon Twitch channel that Joel Z cited as an inspiration.

Despite its progress, Claude does not appear to have beaten “Pokémon Red” yet. Does that mean Gemini is objectively better at the game? On his Twitch page, Joel Z urged viewers, “Please don’t consider this a benchmark for how well an LLM can play Pokemon. You can’t really make direct comparisons — Gemini and Claude have different tools and receive different information.”

And both AI models need help to play the game — that’s where the aforementioned agent harnesses come in, providing the models with game screenshots overlaid with additional information, allowing the model to decide how to respond (which may involve calling specialized agents), and then pressing the button that corresponds with the AI’s instruction.

Techcrunch event

Berkeley, CA | June 5

BOOK NOW

Joel Z acknowledged that there were other “dev interventions” to help Gemini complete the game, but insisted that it’s not cheating.

“My interventions improve Gemini’s overall decision-making and reasoning abilities,” he says. “I don’t give specific hints — there are no walkthroughs or direct instructions for particular challenges like Mt. Moon. The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokemon Yellow.”

Plus, he said, “Gemini Plays Pokémon is still actively being developed, and the framework continues to evolve.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Gemini 宝可梦 人工智能 游戏AI
相关文章