The Verge - Artificial Intelligences 2024年07月11日
Google says Gemini AI is making its robots smarter
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌DeepMind利用Gemini AI技术提升其机器人的导航和完成任务的能力。通过Gemini 1.5 Pro的长上下文窗口,R2-T机器人能够理解自然语言指令,并在模拟环境中执行复杂任务,如指引充电插座或检查冰箱中的物品。

🤖 Gemini AI技术通过视频教程让R2-T机器人学习环境布局,使其能够根据观察执行口头和图像指令,如指引充电插座。

📚 Gemini 1.5 Pro的长上下文窗口允许机器人处理更多信息,使用户能够更轻松地与机器人互动。

🔍 研究发现,Gemini技术帮助机器人不仅限于导航,还能计划如何完成复杂指令,如检查冰箱是否有可乐。

🚀 虽然处理指令需要一定时间,但Gemini技术的成功率为90%,显示了其在大型操作区域内的有效性。

🔧 DeepMind计划进一步研究Gemini的结果,以推动机器人在环境映射和任务执行方面的进步。

DeepMind found “preliminary evidence” that Gemini enables its robots to plan how to undertake complex tasks from simple instructions. | Image: Google DeepMind

Google is training its robots with Gemini AI so they can get better at navigation and completing tasks. The DeepMind robotics team explained in a new research paper how using Gemini 1.5 Pro’s long context window — which dictates how much information an AI model can process — allows users to more easily interact with its R2-T robots using natural language instructions.

This works by filming a video tour of a designated area, such as a home or office space, with researchers using Gemini 1.5 Pro to make the robot “watch” the video to learn about the environment. The robot can then undertake commands based on what it has observed using verbal and / or image outputs — such as guiding users to a power outlet after being shown a phone and asked “where can I charge this?” DeepMind says its Gemini-powered robot had a 90 percent success rate across over 50 user instructions that were given in a 9,000-plus-square-foot operating area.

View this post on Instagram

A post shared by Google DeepMind (@googledeepmind)

Researchers also found “preliminary evidence” that Gemini 1.5 Pro enabled its droids to plan how to fulfill instructions beyond just navigation. For example, when a user with lots of Coke cans on their desk asks the droid if their favorite drink is available, the team said Gemini “knows that the robot should navigate to the fridge, inspect if there are Cokes, and then return to the user to report the result.” DeepMind says it plans to investigate these results further.

The video demonstrations provided by Google are impressive, though the obvious cuts after the droid acknowledges each request hide that it takes between 10–30 seconds to process these instructions, according to the research paper. It may take some time before we’re sharing our homes with more advanced environment-mapping robots, but at least these ones might be able to find our missing keys or wallets.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

谷歌DeepMind Gemini AI 机器人导航 任务执行
相关文章