EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

cs.AI updates on arXiv.org 16小时前

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

本文探讨了利用人类视频训练VLA模型以实现机器人操作，通过人类动作预测和模拟，提高机器人操作性能，并提出了一种新的模拟基准Isaac Humanoid Manipulation Benchmark。

arXiv:2507.12440v1 Announce Type: cross Abstract: Real robot data collection for imitation learning has led to significant advancements in robotic manipulation. However, the requirement for robot hardware in the process fundamentally constrains the scale of the data. In this paper, we explore training Vision-Language-Action (VLA) models using egocentric human videos. The benefit of using human videos is not only for their scale but more importantly for the richness of scenes and tasks. With a VLA trained on human video that predicts human wrist and hand actions, we can perform Inverse Kinematics and retargeting to convert the human actions to robot actions. We fine-tune the model using a few robot manipulation demonstrations to obtain the robot policy, namely EgoVLA. We propose a simulation benchmark called Isaac Humanoid Manipulation Benchmark, where we design diverse bimanual manipulation tasks with demonstrations. We fine-tune and evaluate EgoVLA with Isaac Humanoid Manipulation Benchmark and show significant improvements over baselines and ablate the importance of human data. Videos can be found on our website: https://rchalyang.github.io/EgoVLA

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

VLA模型机器人操作人类视频模拟基准

相关文章

李飞飞团队提出ReKep，让机器人具备空间智能，还能整合GPT-4o

Latent Action Pretraining for General Action models (LAPA): An Unsupervised Method for Pretraining Vision-Language-Action (VLA) Models without Ground-Truth Robot Action Labels

NeurIPS 2024 | 机器人操纵世界模型来了，成功率超过谷歌RT-1 26.6%

本周四｜具身智能，全球最大双臂机器人扩散大模型RDT，清华，开源

元戎启行获1亿美元C1轮融资

云启伙伴 | 天使轮项目「元戎启行」获1亿美元C1轮融资，高阶智驾量产加速中

1 年前被质疑“无图”，如今成为大预言家

详解“端到端”下一代模型VLA，通向自动驾驶的关键跳板

机器人训练数据不缺了！英伟达团队推出DexMG：数据增强200倍

Meet GRAPE: A Plug-and-Play Algorithm to Generalize Robot Policies via Preference Alignment