本周四｜具身智能，全球最大双臂机器人扩散大模型RDT，清华，开源

智源社区 2024年11月05日

本周四｜具身智能，全球最大双臂机器人扩散大模型RDT，清华，开源

本文介绍了一种名为机器人扩散Transformer（RDT）的创新扩散基础模型，旨在解决双臂机器人操作中存在的挑战。RDT通过创新的Transformer设计，有效处理多模态输入的异质性，并捕捉机器人数据中的非线性及高频特性。为了克服训练数据稀缺的问题，RDT引入了物理可解释的统一动作空间，方便学习可转移的物理知识。最终，RDT在最大的多机器人数据集上进行预训练，并扩展到12亿参数，在真实机器人实验中显著优于现有方法，展现出零样本泛化、理解语言指令、快速学习新技能等能力，能够有效处理复杂的灵巧任务。

🤖RDT是一种用于双臂机器人操作的创新扩散基础模型，它利用Transformer设计来处理多模态输入的异质性，并捕捉机器人数据的非线性和高频特性。

💡为了解决训练数据稀缺问题，RDT引入了物理可解释的统一动作空间，该空间可以统一不同机器人的动作表示，并保留原始动作的物理含义，方便学习可转移的物理知识。

💪RDT在目前最大的多机器人数据集上进行预训练，并扩展到12亿参数，成为目前用于机器人操作的最大基于扩散的基础模型。

🦾在真实机器人实验中，RDT表现出色，能够零样本泛化到未见过的物体和场景，理解并遵循语言指令，仅需1~5次演示就能学习新技能，并有效处理复杂的灵巧任务。

报告主题：双臂机器人扩散大模型RDT

报告日期：11月07日（周四）10:30-11:30

报告要点:

Bimanual manipulation is essential in robotics, yet developing foundation models is extremely challenging due to the inherent complexity of coordinating two robot arms (leading to multi-modal action distributions) and the scarcity of training data. In this paper, we present the Robotics Diffusion Transformer (RDT), a pioneering diffusion foundation model for bimanual manipulation. RDT builds on diffusion models to effectively represent multi-modality, with innovative designs of a scalable Transformer to deal with the heterogeneity of multi-modal inputs and to capture the nonlinearity and high frequency of robotic data. To address data scarcity, we further introduce a Physically Interpretable Unified Action Space, which can unify the action representations of various robots while preserving the physical meanings of original actions, facilitating learning transferrable physical knowledge. With these designs, we managed to pre-train RDT on the largest collection of multi-robot datasets to date and scaled it up to 1.2B parameters, which is the largest diffusion-based foundation model for robotic manipulation. We finally fine-tuned RDT on a self-created multi-task bimanual dataset with over 6K+ episodes to refine its manipulation capabilities. Experiments on real robots demonstrate that RDT significantly outperforms existing methods. It exhibits zero-shot generalization to unseen objects and scenes, understands and follows language instructions, learns new skills with just 1~5 demonstrations, and effectively handles complex, dexterous tasks.

本文介绍了一种用于双臂机器人操作的创新扩散基础模型——机器人扩散Transform ransformer设计来处理多模态输入的异质性，捕捉机器人数据的非线性和高频特性。为了解决数据稀缺问题，文章进一步引入了一种物理可解释的统一动作空间，该空间可以统一各种机器人的动作表示，并保留原始动作的物理含义，方便学习可转移的物理知识。通过这些设计，作者成功地在目前最大的多机器人数据集上对RDT进行了预训练，并将其扩展到了12亿个参数，这是目前用于机器人操作的最大的基于扩散的基础模型。最后，作者在一个自己创建的多任务双臂数据集上对RDT进行了微调，以提高其操作能力。在真实机器人实验中，RDT明显优于现有方法。它能够零样本泛化到未见过的物体和场景，理解和遵循语言指令，只需1~5个演示就能学习新技能，并有效地处理复杂的灵巧任务。可访问https://rdt-robotics.github.io/rdt-robotics/获取代码和视频。

报告嘉宾：

刘松铭，清华大学计算机系二年级博士生，主要研究方向是具身智能和 AI for Science，此前在 ICML 和 NeurIPS 等顶级会议发表多篇论文，本科期间曾获清华大学特等奖学金。

更多热门报告

点赞收藏评论分享到Link

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

双臂机器人扩散模型机器人操作 RDT 基础模型

相关文章

Top Important Computer Vision Papers for the Week from 29/04 to 05/05

Comment on What should the UK’s £100 million Foundation Model Taskforce do? by Import AI 334: Better distillation; the UK’s AI taskforce; money and AI | Import AI

Comment on What should the UK’s £100 million Foundation Model Taskforce do? by Government-issued digital money gets closer - The World News Papers

This AI Research Introduces SubGDiff: Utilizing Diffusion Model to Improve Molecular Representation Learning

AI generates high-quality images 30 times faster in a single step

Paris-based AGI Startup The “H” Company Secures $220M in Seed Funding

DIAMOND (DIffusion as a Model of Environment Dreams): A Reinforcement Learning Agent Trained in a Diffusion World Model

AmbientGPT: An Open-Source and Multimodal MacOS Foundation Model GUI

Transparency in Foundation Models: The Next Step in Foundation Model Transparency Index FMTI

Controlled diffusion model can change material properties in images