Mashable 03月25日 06:13
Anthropic’s AI agent Claude is playing Pokémon and just can’t catch ‘em all
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic公司推出了一项引人注目的实验,让其AI模型Claude在Twitch上直播挑战《精灵宝可梦:红》。尽管Claude在游戏初期展现了进步,例如在几小时内击败了小刚,但其进展随后停滞不前。与人类玩家相比,Claude在游戏中的行动显得笨拙,尤其是在地图导航方面。尽管如此,Claude的进步也显而易见,例如它能够提前规划、记住目标并从错误中学习。这项实验揭示了AI在处理文本信息方面具有优势,但在视觉和空间推理方面仍面临挑战,同时也提醒我们,AI距离完全掌控世界还有很长的路要走。

🕹️ Anthropic公司推出了“Claude玩宝可梦”的直播项目,旨在测试其AI模型在游戏中的表现。

🚀 Claude最初的几个版本在基本任务中表现不佳,例如频繁逃避战斗。经过几次迭代后,Claude 3.7 Sonnet在游戏中取得了显著进展,能够在几小时内击败道馆馆主。

🤔 Claude 3.7 Sonnet能够提前规划、记住目标并从错误中学习,并构建知识库,模拟按钮操作,这与之前的版本不同。

🐌 尽管Claude 3.7 Sonnet取得了进步,但其在游戏中的进展似乎停滞不前,例如在月见山耗费了78小时,而人类玩家通常只需几个小时。

Last month, the $61.5 billion-valuated AI startup Anthropic set up a gaming livestream on Twitch. Gaming livestreams are nothing new on Twitch, but this one is a little different: Claude, Anthropic's AI model, is attempting to beat Pokémon Red.

We are now one month in, and the livestream is still going. However, Claude has not progressed all that much. And, at this rate, Anthropic's AI agent may possibly never be the very best, like no one ever was.

According to Anthropic, when it first launched the "Claude Plays Pokémon" project, previous versions of its AI agent Claude failed at some very basic tasks. For example, according to Anthropic, Claude 3.5 would try to run away from almost every battle in June 2024.

A few months and a few versions of Claude later, Anthropic said there was a stark change. In February 2025, Anthropic gave Claude 3.7 Sonnet a whirl at playing Pokémon. 

"Within hours, Claude defeated Brock. Days later, it trounced Misty," Anthropic said. "Progress that older models had little hope of achieving."

Anthropic said that Claude 3.7 Sonnet could plan ahead, remember objectives, and learn from its mistakes, unlike previous versions of the AI agent. It also built a knowledge base, saw the screen, and simulated button presses.

However, the progress Claude 3.7 Sonnet originally made in the game seems to have stalled.

For example, livestream viewers watched as Clause 3.7 took 78 hours to get through Mt. Moon in the game. On Reddit, gamers estimated that it would typically take a child just a few hours to advance through the same stage.

Claude can be seen going in circles, stumbling around the same paths, and often knocking into walls as it tries to get around the game.

The livestream is engaging, especially as a text box lays out Claude's "thinking" as the AI agent tries to figure out what moves to make next.

According to Anthropic engineers in an interview with Ars Technica, Claude has an easier time with aspects of the game which involve text, such as Pokémon battles. However, it struggles with the more visual aspects of the game, such as moving around from town to town on the map.

Claude 3.7 Sonnet has gone much further in the game than previous Claude models, so there's been progress. However, for those warning that AI will soon be able to take over the world, we're nowhere close to that being a reality yet. Claude still has 151 Pokémon to catch.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Anthropic Claude AI 精灵宝可梦 游戏
相关文章