The Verge - Artificial Intelligences 03月12日 23:29
Google DeepMind’s new AI models help robots perform physical tasks, even without training
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind推出了两款全新AI模型,旨在帮助机器人执行前所未有的广泛现实任务。Gemini Robotics基于Gemini 2.0构建,通过增加物理动作作为新的模态,将Gemini的多模态世界理解能力转移到现实世界。该模型在通用性、互动性和灵活性三个关键领域取得了进展,能够更好地与人和环境互动,并执行更精确的物理任务。此外,Google DeepMind还推出了Gemini Robotics-ER,这是一种先进的视觉语言模型,能够理解复杂和动态的世界,并与现有的低级控制器连接,从而实现新的功能。

🤖 Gemini Robotics是基于Gemini 2.0构建的视觉-语言-动作模型,它能够理解新的情境,即使没有经过相关训练,通过增加物理动作作为新的模态,将Gemini的多模态世界理解能力转移到现实世界。

🤝 Gemini Robotics在通用性、互动性和灵活性三个关键领域取得了显著进展,使其能够更好地与人和环境互动,并执行诸如折纸或开瓶盖等更精确的物理任务。

🧠 Gemini Robotics-ER是一种先进的视觉语言模型,旨在理解我们复杂和动态的世界,使机器人能够执行诸如打包午餐盒之类的任务,需要理解物体的位置、如何打开盒子以及如何放置物品。

🛡️ Google DeepMind正在开发一种“分层方法”来确保安全,Gemini Robotics-ER模型经过训练,可以评估潜在动作在给定场景中是否安全,同时发布新的基准和框架,以帮助进一步开展AI行业的安全研究。

Google DeepMind is launching two new AI models designed to help robots “perform a wider range of real-world tasks than ever before.” The first, called Gemini Robotics, is a vision-language-action model capable of understanding new situations, even if it hasn’t been trained on them.

Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model. During a press briefing, Carolina Parada, the senior director and head of robotics at Google DeepMind, said Gemini Robotics “draws from Gemini’s multimodal world understanding and transfers it to the real world by adding physical actions as a new modality.”

The new model makes advancements in three key areas that Google DeepMind says are essential to building helpful robots: generality, interactivity, and dexterity. In addition to the ability to generalize new scenarios, Gemini Robotics is better at interacting with people and their environment. It’s also capable of performing more precise physical tasks, such as folding a piece of paper or removing a bottle cap.

“While we have made progress in each one of these areas individually in the past with general robotics, we’re bringing [drastically] increasing performance in all three areas with a single model,” Parada said. “This enables us to build robots that are more capable, that are more responsive and that are more robust to changes in their environment.”

Google DeepMind is also launching Gemini Robotics-ER (or embodied reasoning), which the company describes as an advanced visual language model that can “understand our complex and dynamic world.”

As Parada explains, when you’re packing a lunchbox and have items on a table in front of you, you’d need to know where everything is, as well as how to open the lunchbox, how to grasp the items, and where to place them. That’s the kind of reasoning Gemini Robotics-ER is expected to do. It’s designed for roboticists to connect with existing low-level controllers — the system that controls a robot’s movements — allowing them to enable new capabilities powered by Gemini Robotics-ER.

In terms of safety, Google DeepMind researcher Vikas Sindhwani told reporters that the company is developing a “layered-approach,” adding that Gemini Robotics-ER models “are trained to evaluate whether or not a potential action is safe to perform in a given scenario.” The company is also releasing new benchmarks and frameworks to help further safety research in the AI industry. Last year, Google DeepMind introduced its “Robot Constitution,” a set of Isaac Asimov-inspired rules for its robots to follow.

Google DeepMind is working with Apptronik to “build the next generation of humanoid robots.” It’s also giving “trusted testers” access to its Gemini Robotics-ER model, including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools. “We’re very focused on building the intelligence that is going to be able to understand the physical world and be able to act on that physical world,” Parada said. “We’re very excited to basically leverage this across multiple embodiments and many applications for us.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google DeepMind Gemini Robotics 人工智能 机器人
相关文章