villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

cs.AI updates on arXiv.org 08月01日 12:08

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

本文介绍ViLLA框架，通过引入潜在动作，提升VLA模型学习机器人操作策略的能力，在模拟和真实环境中均表现优异，为未来研究奠定基础。

arXiv:2507.23682v1 Announce Type: cross Abstract: Visual-Language-Action (VLA) models have emerged as a popular paradigm for learning robot manipulation policies that can follow language instructions and generalize to novel scenarios. Recent work has begun to explore the incorporation of latent actions, an abstract representation of visual change between two frames, into VLA pre-training. In this paper, we introduce villa-X, a novel Visual-Language-Latent-Action (ViLLA) framework that advances latent action modeling for learning generalizable robot manipulation policies. Our approach improves both how latent actions are learned and how they are incorporated into VLA pre-training. Together, these contributions enable villa-X to achieve superior performance across simulated environments including SIMPLER and LIBERO, as well as on two real-world robot setups including gripper and dexterous hand manipulation. We believe the ViLLA paradigm holds significant promise, and that our villa-X provides a strong foundation for future research.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ViLLA框架机器人操作潜在动作 VLA模型机器人学习

相关文章

The Third Wave of Robotic Learning with Ken Goldberg - #359

Learning Semantically Meaningful and Actionable Representations with Ashutosh Saxena - TWiML Talk #170

Deep Robotic Learning with Sergey Levine - TWiML Talk #37

Random robots are more reliable

A technique for more effective multipurpose robots

A faster way to teach a robot

Theia: A Robot Vision Foundation Model that Simultaneously Distills Off-the-Shelf VFMs such as CLIP, DINOv2, and ViT

“Alpha 乒乓”来了，学了 1.4 万个对拉球，谷歌乒乓机器人球技横扫大部分选手，网友：4 年后代表美国打奥运

100%打赢人类新手，DeepMind推出首个乒乓球机器人

100%打赢人类新手！DeepMind推出首个媲美人类中级选手的乒乓球机器人，正反手、旋转球都能接