MarkTechPost@AI 2024年11月09日
Google DeepMind Researchers Propose RT-Affordance: A Hierarchical Method that Uses Affordances as an Intermediate Representation for Policies
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近年来,大型预训练模型在学习机器人策略方面取得了显著进展。本文介绍了Google DeepMind提出的RT-Affordance方法,该方法利用affordance作为中间表示,构建分层模型,以提高机器人策略的鲁棒性和泛化能力。RT-Affordance首先根据任务语言和初始图像预测affordance计划,然后将该计划与语言指令结合,指导机器人执行任务。该方法结合了网络数据集和机器人轨迹数据,并在抓取、放置等任务中取得了优于传统方法的性能,展现了其在解决复杂机器人操作任务方面的潜力。

🤔 **RT-Affordance是一种分层模型,利用affordance作为中间表示来指导机器人策略。**该方法首先根据任务语言和初始图像预测affordance计划,然后将该计划与语言指令结合,指导机器人执行任务。

🤝 **RT-Affordance结合了网络数据集和机器人轨迹数据,提高了模型的泛化能力。**这种方法能够更好地处理新的物体、场景和任务,例如抓取形状复杂的家庭用品(如水壶、簸箕和锅)。

📈 **RT-Affordance在机器人抓取和放置等任务中取得了显著的性能提升。**例如,在抓取任务中,RT-Affordance的成功率达到68%-76%,显著高于传统方法。

⚠️ **RT-Affordance在面对完全新的动作或技能时,性能略有下降。**这表明该方法仍需进一步改进,以适应更广泛的任务场景。

💡 **RT-Affordance为机器人领域未来的研究提供了新的方向。**基于affordance的策略引导方法有望成为未来研究的基准,推动机器人操作技术的发展。

In recent years, there has been significant development in the field of large pre-trained models for learning robot policies. The term “policy representation” here refers to the different ways of interfacing with the decision-making mechanisms of robots, which can potentially facilitate generalization to new tasks and environments. Vision-language-action (VLA) models are pre-trained with large-scale robot data to integrate visual perception, language understanding, and action-based decision-making to guide robots in various tasks. On top of vision-language models (VLMs), they come up with the promise of generalization to new objects, scenes, and tasks. However, VLAs still need to be more reliable to be deployed outside the narrow lab settings they are trained in. While these drawbacks can be mitigated by expanding the scope and diversity of robot datasets, this is highly resource-intensive and challenging to scale. In simple words, these policy representations either need to provide more context or over-specified context that yields less robust policies.


Existing policy representations such as language, goal images, and trajectory sketches are widely used and are helpful. One of the most common policy representations is conditioning on language. Most of the robot datasets are labeled with underspecified descriptions of the task, and language-based guidance does not provide enough guidance on how to perform the task. Goal image-conditioned policies provide detailed spatial information about the final goal configuration of the scene. However, goal images are high-dimensional, which presents learning challenges due to over-specification issues. Intermediate representation such as Trajectory sketches, or key points attempts to provide spatial plans for guiding the robot’s actions. While these spatial plans provide guidance, they still lack sufficient information for the policy on how to perform specific movements.

A team of researchers from Google DeepMind conducted detailed research on policy representation for robots and proposed RT-Affordance which is a hierarchical model that first creates an affordance plan given the task language, and then uses the policy on this affordance plan to guide the robot’s actions for manipulation. In robotics, affordance refers to the potential interactions that an object enables for a robot, based on its shape, size etc. The RT-Affordance model can easily connect heterogeneous sources of supervision including large web datasets and robot trajectories. 

First, the affordance plan is predicted for the given task language and the initial image of the task. This affordance plan is then combined with language instructions to condition the policy for task execution. It is then projected onto the image, and following this, the policy is conditioned on images overlaid with the affordance plan. The model is co-trained on web datasets (the largest data source), robot trajectories, and a modest number of cheap-to-collect images labeled with affordances. This approach benefits from leveraging both robot trajectory data and extensive web datasets, allowing the model to generalize well across new objects, scenes, and tasks.

The research team conducted various experiments that mainly focused on how affordances help to improve robotic grasping, especially for movements of household items with complex shapes (like kettles, dustpans, and pots). A detailed evaluation showed that RT-A remains robust across various out-of-distribution (OOD) scenarios, such as novel objects, camera angles, and backgrounds. The RT-A model performed better than RT-2 and its goal-conditioned variant, achieving success rates of 68%-76% compared to RT-2’s 24%-28%. In tasks beyond grasping, like placing objects into containers, RT-A showed a significant performance with a 70% success rate. However, the performance of RT-A slightly dropped when it faced entirely new objects.

In conclusion, affordance-based policies are well-guided and also perform in a better way. The RT- Affordance method significantly improves the robustness and generalization of robot policies, which makes it a valuable tool for diverse manipulation tasks. Although it can not adapt to entirely new moments or skills, RT-Affordance surpasses traditional methods in terms of performance. This affordance technique opens the gate for various future research opportunities in robotics and can serve as a baseline for future studies!


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

The post Google DeepMind Researchers Propose RT-Affordance: A Hierarchical Method that Uses Affordances as an Intermediate Representation for Policies appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人策略 RT-Affordance affordance 机器人学习 Google DeepMind
相关文章