MIT News - Machine learning 2024年11月20日
Can robots learn from machine dreams?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIT CSAIL的研究人员开发了一种名为LucidSim的新型机器人训练系统,该系统利用生成式AI和物理模拟器创建多样化且逼真的虚拟训练环境,帮助机器人无需真实世界数据即可在复杂任务中达到专家级水平。LucidSim解决了机器人学习中一直存在的‘模拟到真实’的差距问题,通过结合物理模拟和生成式AI模型,生成包含深度图和语义掩码的图像以及短视频,从而为机器人提供丰富的训练数据。研究表明,LucidSim训练的机器人能够在真实环境中识别和导航障碍物,并且其性能显著优于传统方法,为机器人训练提供了一种更有效、更具扩展性的解决方案。

🤔LucidSim利用生成式AI和物理模拟器,创建多样化且逼真的虚拟训练环境,帮助机器人无需真实世界数据即可学习复杂任务。它解决了机器人学习中一直存在的‘模拟到真实’的差距问题,即模拟环境与复杂、不可预测的真实世界之间的差异。

🖼️LucidSim通过生成式模型生成包含深度图和语义掩码的图像,并利用物理模拟器确保这些图像符合真实世界的物理规律,从而提供视觉上逼真的训练数据。这些数据包含了场景的几何信息和不同部分的标签,例如物体、地面和墙壁等。

🎬LucidSim还开发了一种名为“Dreams In Motion”的技术,将单张生成的图像转换为短视频,为机器人提供“体验”,从而提高训练效率。该技术通过计算每个像素在帧之间的运动来实现,并考虑了场景的3D几何形状和机器人视角的变化。

🤖研究结果表明,使用LucidSim训练的机器人能够在真实环境中识别和导航障碍物,并且其性能显著优于传统方法,例如领域随机化方法。这表明LucidSim能够有效地将模拟环境中的学习成果迁移到真实世界中。

📈LucidSim训练的机器人在复杂任务中的成功率大幅提升,例如在机器人四足运动和跑酷等任务中,成功率从15%提升至88%。这表明LucidSim能够显著加速适应性强、智能机器人的部署,并为机器人训练提供了一种更有效、更具扩展性的解决方案。

For roboticists, one challenge towers above all others: generalization — the ability to create machines that can adapt to any environment or condition. Since the 1970s, the field has evolved from writing sophisticated programs to using deep learning, teaching robots to learn directly from human behavior. But a critical bottleneck remains: data quality. To improve, robots need to encounter scenarios that push the boundaries of their capabilities, operating at the edge of their mastery. This process traditionally requires human oversight, with operators carefully challenging robots to expand their abilities. As robots become more sophisticated, this hands-on approach hits a scaling problem: the demand for high-quality training data far outpaces humans’ ability to provide it.

Now, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers has developed a novel approach to robot training that could significantly accelerate the deployment of adaptable, intelligent machines in real-world environments. The new system, called “LucidSim,” uses recent advances in generative AI and physics simulators to create diverse and realistic virtual training environments, helping robots achieve expert-level performance in difficult tasks without any real-world data.

LucidSim combines physics simulation with generative AI models, addressing one of the most persistent challenges in robotics: transferring skills learned in simulation to the real world. “A fundamental challenge in robot learning has long been the ‘sim-to-real gap’ — the disparity between simulated training environments and the complex, unpredictable real world,” says MIT CSAIL postdoc Ge Yang, a lead researcher on LucidSim. “Previous approaches often relied on depth sensors, which simplified the problem but missed crucial real-world complexities.”

The multipronged system is a blend of different technologies. At its core, LucidSim uses large language models to generate various structured descriptions of environments. These descriptions are then transformed into images using generative models. To ensure that these images reflect real-world physics, an underlying physics simulator is used to guide the generation process.

The birth of an idea: From burritos to breakthroughs

The inspiration for LucidSim came from an unexpected place: a conversation outside Beantown Taqueria in Cambridge, Massachusetts. ​​“We wanted to teach vision-equipped robots how to improve using human feedback. But then, we realized we didn’t have a pure vision-based policy to begin with,” says Alan Yu, an undergraduate student in electrical engineering and computer science (EECS) at MIT and co-lead author on LucidSim. “We kept talking about it as we walked down the street, and then we stopped outside the taqueria for about half-an-hour. That’s where we had our moment.”

To cook up their data, the team generated realistic images by extracting depth maps, which provide geometric information, and semantic masks, which label different parts of an image, from the simulated scene. They quickly realized, however, that with tight control on the composition of the image content, the model would produce similar images that weren’t different from each other using the same prompt. So, they devised a way to source diverse text prompts from ChatGPT.

This approach, however, only resulted in a single image. To make short, coherent videos that serve as little “experiences” for the robot, the scientists hacked together some image magic into another novel technique the team created, called “Dreams In Motion.” The system computes the movements of each pixel between frames, to warp a single generated image into a short, multi-frame video. Dreams In Motion does this by considering the 3D geometry of the scene and the relative changes in the robot’s perspective.

“We outperform domain randomization, a method developed in 2017 that applies random colors and patterns to objects in the environment, which is still considered the go-to method these days,” says Yu. “While this technique generates diverse data, it lacks realism. LucidSim addresses both diversity and realism problems. It’s exciting that even without seeing the real world during training, the robot can recognize and navigate obstacles in real environments.”

The team is particularly excited about the potential of applying LucidSim to domains outside quadruped locomotion and parkour, their main test bed. One example is mobile manipulation, where a mobile robot is tasked to handle objects in an open area; also, color perception is critical. “Today, these robots still learn from real-world demonstrations,” says Yang. “Although collecting demonstrations is easy, scaling a real-world robot teleoperation setup to thousands of skills is challenging because a human has to physically set up each scene. We hope to make this easier, thus qualitatively more scalable, by moving data collection into a virtual environment.”

Who's the real expert?

The team put LucidSim to the test against an alternative, where an expert teacher demonstrates the skill for the robot to learn from. The results were surprising: Robots trained by the expert struggled, succeeding only 15 percent of the time — and even quadrupling the amount of expert training data barely moved the needle. But when robots collected their own training data through LucidSim, the story changed dramatically. Just doubling the dataset size catapulted success rates to 88 percent. “And giving our robot more data monotonically improves its performance — eventually, the student becomes the expert,” says Yang.

“One of the main challenges in sim-to-real transfer for robotics is achieving visual realism in simulated environments,” says Stanford University assistant professor of electrical engineering Shuran Song, who wasn’t involved in the research. “The LucidSim framework provides an elegant solution by using generative models to create diverse, highly realistic visual data for any simulation. This work could significantly accelerate the deployment of robots trained in virtual environments to real-world tasks.”

From the streets of Cambridge to the cutting edge of robotics research, LucidSim is paving the way toward a new generation of intelligent, adaptable machines — ones that learn to navigate our complex world without ever setting foot in it.

Yu and Yang wrote the paper with four fellow CSAIL affiliates: Ran Choi, an MIT postdoc in mechanical engineering; Yajvan Ravan, an MIT undergraduate in EECS; John Leonard, the Samuel C. Collins Professor of Mechanical and Ocean Engineering in the MIT Department of Mechanical Engineering; and Phillip Isola, an MIT associate professor in EECS. Their work was supported, in part, by a Packard Fellowship, a Sloan Research Fellowship, the Office of Naval Research, Singapore’s Defence Science and Technology Agency, Amazon, MIT Lincoln Laboratory, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. The researchers presented their work at the Conference on Robot Learning (CoRL) in early November.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人 生成式AI 机器人训练 LucidSim 模拟到真实
相关文章