MarkTechPost@AI 2024年12月08日
Meet GRAPE: A Plug-and-Play Algorithm to Generalize Robot Policies via Preference Alignment
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GRAPE 是一种新型的即插即用算法,通过偏好对齐来泛化机器人策略。它采用轨迹级策略优化技术,通过对成功和失败的试验序列进行隐式建模奖励来调整机器人策略,从而增强了模型在各种操作任务中的泛化能力。GRAPE 将复杂任务分解为多个独立阶段,并利用大型视觉模型提出每个阶段的关键点,并将其与时空约束相关联,这些约束可以根据不同的操作目标进行定制,包括任务完成、机器人交互安全性和操作成本效率,标志着机器人策略开发的重大进步。

🤖GRAPE 算法通过轨迹级策略优化技术,对成功和失败的试验序列进行隐式建模奖励,从而调整机器人策略,解决了 VLA 模型在泛化方面的根本限制。

🛠️该方法将复杂操作任务分解为多个独立阶段,利用大型视觉模型提出每个阶段的关键点,并将其与时空约束相关联。

🎯这些时空约束可以根据不同的操作目标进行定制,包括任务完成、机器人交互安全性和操作成本效率,提供了前所未有的灵活性。

🔬研究团队在模拟和现实世界的机器人环境中对 GRAPE 进行了全面评估,结果表明 GRAPE 在各种泛化方面均优于现有模型。

🌎在现实世界的实验中,GRAPE 在域内任务中取得了 67.5% 的成功率,在具有挑战性的泛化任务中,在视觉、动作和语言场景中保持了 52.3% 的平均成功率。

The field of robotic manipulation has witnessed a remarkable transformation with the emergence of vision-language-action (VLA) models. These advanced computational frameworks have demonstrated significant potential in executing complex manipulation tasks across diverse environments. Despite their impressive capabilities, VLA models encounter substantial challenges in generalizing across novel contexts, including different objects, environments, and semantic scenarios. 

The fundamental limitation stems from current training methodologies, particularly supervised fine-tuning (SFT), which predominantly relies on behavioral imitation through successful action rollouts. This approach restricts models from developing a comprehensive understanding of task objectives and potential failure mechanisms. Consequently, the models often struggle to adapt to nuanced variations and unforeseen scenarios, highlighting the critical need for more sophisticated training strategies.

Previous research in robotic learning predominantly employed hierarchical planning strategies, with models like Code as Policies and EmbodiedGPT utilizing large language models and vision-language models to generate high-level action plans. These approaches typically utilize large language models to create action sequences, followed by low-level controllers to resolve local trajectory challenges. However, such methodologies demonstrate significant limitations in skill adaptability and generalization across everyday robotic manipulation tasks.

VLA models have pursued two primary approaches to action planning: action space discretization and diffusion models. The discretization approach, exemplified by OpenVLA, involves uniformly truncating action spaces into discrete tokens, while preserving autoregressive language decoding objectives. Diffusion models, conversely, generate action sequences through multiple denoising steps rather than producing singular stepwise actions. Despite these structural variations, these models consistently rely on supervised training using successful action rollouts, which fundamentally constrains their generalizability to novel manipulation scenarios.

Researchers from UNC Chapel-Hill, the University of Washington, and the University of Chicago introduce GRAPE (Generalizing Robot Policy via Preference Alignment), an innovative approach designed to address fundamental limitations in VLA model training. GRAPE presents a robust trajectory-wise preference optimization (TPO) technique that strategically aligns robotic policies by implicitly modeling rewards from successful and unsuccessful trial sequences. This methodology enables enhanced generalizability across diverse manipulation tasks by moving beyond traditional training constraints.

At the core of GRAPE’s approach is a sophisticated decomposition strategy that breaks complex manipulation tasks into multiple independent stages. The method offers unprecedented flexibility by utilizing a large vision model to propose critical keypoints for each stage and associating them with spatial-temporal constraints. These customizable constraints allow alignment with varied manipulation objectives, including task completion, robot interaction safety, and operational cost-efficiency, marking a significant advancement in robotic policy development.

The research team conducted comprehensive evaluations of GRAPE across simulation and real-world robotic environments to validate its performance and generalizability. In simulation environments like Simpler-Env and LIBERO, GRAPE demonstrated remarkable capabilities, outperforming existing models Octo-SFT and OpenVLA-SFT by significant margins. Specifically, in Simpler-Env, GRAPE exceeded the performance of previous models by an average of 24.48% and 13.57%, respectively, across various generalization aspects including subject, physical, and semantic domains.

The real-world experimental results further substantiated GRAPE’s effectiveness, with the model showcasing exceptional adaptability across diverse task scenarios. In in-domain tasks, GRAPE achieved a 67.5% success rate, representing a substantial 22.5% improvement over OpenVLA-SFT and dramatically surpassing Octo-SFT. Particularly impressive was GRAPE’s performance in challenging generalization tasks, where it maintained superior results across visual, action, and language grounding scenarios, with an impressive total average success rate of 52.3% – a significant 19% advancement over existing approaches.

This research introduces GRAPE as a transformative solution to critical challenges confronting VLA models, particularly their limited generalizability and adaptability across manipulation tasks. By implementing a novel trajectory-level policy alignment approach, GRAPE demonstrates remarkable capability in learning from both successful and unsuccessful trial sequences. The methodology offers unprecedented flexibility in aligning robotic policies with diverse objectives, including safety, efficiency, and task completion through innovative spatiotemporal constraint mechanisms. Experimental findings validate GRAPE’s significant advancements, showcasing substantial performance improvements across in-domain and unseen task environments. 


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 60k+ ML SubReddit.

[Must Attend Webinar]: ‘Transform proofs-of-concept into production-ready AI applications and agents’ (Promoted)

The post Meet GRAPE: A Plug-and-Play Algorithm to Generalize Robot Policies via Preference Alignment appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人操作 视觉语言动作模型 泛化能力 偏好对齐 GRAPE算法
相关文章