MarkTechPost@AI 2024年11月07日
MIT Researchers Developed Heterogeneous Pre-trained Transformers (HPTs): A Scalable AI Approach for Robotic Learning from Heterogeneous Data
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIT的研究人员开发了一种名为异构预训练Transformer(HPT)的框架,旨在解决机器人学习中数据异构性带来的挑战。HPT通过在不同机器人实体和任务上进行预训练,构建了一个共享的知识表示,从而使机器人能够快速适应新的任务和环境。该框架结合了本体感受和视觉信息,并利用共享的模型主干和特定任务的头部,实现了跨不同机器人实体的知识迁移。实验结果表明,HPT在多个模拟器基准和真实世界环境中显著提高了机器人的泛化能力和性能,为机器人基础模型的构建提供了新的思路。

🤔 **HPT框架旨在解决机器人学习中的数据异构性问题。** 由于不同机器人的物理形态、传感器和操作环境存在差异,导致学习到的策略难以泛化到其他场景。HPT通过预训练,构建了一个共享的知识表示,从而使机器人能够快速适应新的任务和环境。

🤖 **HPT架构包括本体感受和视觉输入的整合。** 为了处理复杂、接触丰富的长时序行为,HPT将来自不同实体的本体感受和视觉输入结合成一个短序列的token,然后将其传递到共享模型主干进行处理。

📊 **HPT在多个基准测试中表现出色。** 研究人员使用了超过50个独立的数据源和超过10亿参数的模型进行预训练,结果表明HPT不仅适用于昂贵的真实机器人操作,也适用于其他类型的实体,并在多个模拟器基准和真实世界环境中提升了微调策略的性能。

💡 **HPT为机器人基础模型的构建提供了新的思路。** 虽然HPT的模型架构和训练过程可以应用于不同的设置,但使用多样化数据进行预训练可能需要更长的收敛时间。这种针对机器人异构性的方法可以启发未来的研究,例如构建更强大的机器人基础模型。

📚 **HPT利用预训练模型来提高机器人学习的泛化能力和性能。** 该方法通过在各种机器人任务和实体上进行预训练,显著提高了机器人的泛化能力和性能,为机器人学习领域带来了新的突破。

In today’s world, building robotic policies is difficult. It often requires collecting specific data for each robot, task, and environment, and the learned policies do not generalize beyond these specific settings. Recent progress in open-source, large-scale data collection has made pre-training on large-scale, high-quality, and diverse data possible. However, in robotics, heterogeneity poses a challenge because robots differ in physical form, sensors, and operating environments. Both proprioception and vision information are important for complex, contact-rich, long-horizon behaviors in robotics. Poor learning of such information can lead to overfitting behaviors such as repeating motions for a particular scene, task, or even trajectory.

The current methods in robotic learning involve collecting data from a single robot embodiment for a specific task and training the model upon it. This is an extensive approach, and the main limitation of this is that the model cannot be generalized for various tasks and robots. Methods like pre-training and transfer learning use data from various fields, such as computer vision and natural language, to help models learn and adapt to newer tasks. Recent works show that small projection layers can be used to combine the pre-trained feature spaces of the foundation models. Different from other fields, robotics has less data quantity and diversity but much more heterogeneity. Also, recent advancements combine multimodal data (images, language, audio) for better representation learning.

A group of researchers from MIT CSAIL and Meta conducted detailed research and proposed a framework named Heterogeneous Pre-trained Transformers (HPT). It is a family of architecture designed to scalably learn from data across heterogeneous embodiments. HPT’s main function is to create a shared understanding or representation of tasks that can be used by different robots in various conditions. Instead of training a robot from scratch for each new task or environment, HPT allows robots to use pre-learned knowledge, making the training process faster and more efficient. This architecture combines the proprioception and vision inputs from distinct embodiments into a short sequence of tokens, which are then processed to control robots for various tasks.

The architecture of HPT consists of the embodiment-specific stem, the shared trunk, and the task-specific heads. HPT is inspired by learning from multimodal data and uses embodiment-specific tokenizers, known as stem, to combine various sensor inputs such as camera views and body movements data. The trunk is a shared model and pre-trained across datasets and is transferred when adapting to new embodiments and tasks that are unknown during the pre-training times. Moreover, it uses task-specific action decoders to produce the action outputs known as heads. After tokenizing each embodiment, HPT operates on a shared space of a short sequence of latent tokens. 

The scaling behaviors and various designs of policy pre-training were investigated using more than 50 individual data sources and a model size of over 1 billion parameters. Many available embodied datasets in different embodiments, such as real robots, simulations, and internet human videos, were incorporated into the pre-training process. The results showed that the HPT framework works well not only with costly real-world robot operations but also with other types of embodiments. It outperforms several baselines and enhances the fine-tuned policy performance by over 20% on unseen tasks in multiple simulator benchmarks and real-world settings.

In conclusion, the proposed framework addresses the heterogeneity and mitigates challenges related to robotic learning by leveraging pre-trained models. The method shows significant improvements in generalization and performance across many robotic tasks and embodiments.  Although the model architecture and training procedure can work with different setups, pre-training with varied data can take a longer time to converge. This perspective towards robotics can inspire future work in handling the heterogeneous nature of robotic data for robotic foundation models!


Check out the Paper, Project, and MIT Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Sponsorship Opportunity with us] Promote Your Research/Product/Webinar with 1Million+ Monthly Readers and 500k+ Community Members

The post MIT Researchers Developed Heterogeneous Pre-trained Transformers (HPTs): A Scalable AI Approach for Robotic Learning from Heterogeneous Data appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人学习 异构预训练Transformer HPT 机器人基础模型 人工智能
相关文章