MIT News - Artificial intelligence 2024年12月13日
Teaching a robot its limits, to complete open-ended tasks safely
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIT的计算机科学与人工智能实验室(CSAIL)的研究人员开发了一种名为PRoC3S的新方法,该方法利用视觉模型和大型语言模型(LLM)来指导机器人在开放式任务中安全、正确地执行操作.PRoC3S通过在模拟器中检查计划的可行性,如果不可行则生成新计划,直到找到机器人可以执行的计划.该方法使机器人能够执行各种任务,如写字母、画星星、分类和放置方块等,并在模拟和现实世界测试中展现了较高的成功率.

🤖PRoC3S结合了LLM和视觉模型:利用LLM进行高级任务规划,同时利用视觉模型来感知机器人周围的环境,并对其约束进行建模,从而确保机器人理解其物理约束,例如可达范围和需要避开的障碍物。

🔄模拟中试错:PRoC3S的核心机制是在模拟环境中测试LLM生成的长期计划.通过模拟,系统可以评估计划的可行性和安全性,并在必要时进行迭代,生成新的、更可行的计划。

🦾真实世界应用:PRoC3S不仅在模拟环境中表现出色,还成功地应用于现实世界的机器人手臂.它能够指导机器人手臂将方块排成直线,并将不同颜色的方块放入相应的碗中,展示了其在实际应用中的潜力。

🚀未来展望:研究人员计划进一步改进PRoC3S,包括使用更先进的物理模拟器,以及通过更具可扩展性的数据搜索技术来扩展到更复杂的长期任务.此外,他们还计划将PRoC3S应用于移动机器人,例如四足机器人,以执行包括行走和扫描周围环境在内的任务。

🌟广泛的应用前景:PRoC3S有望使机器人能够在动态环境中执行更复杂的任务,例如家庭环境.它可以帮助机器人理解并可靠地执行广泛的任务,从而推动机器人技术在实际应用中的发展.

If someone advises you to “know your limits,” they’re likely suggesting you do things like exercise in moderation. To a robot, though, the motto represents learning constraints, or limitations of a specific task within the machine’s environment, to do chores safely and correctly.

For instance, imagine asking a robot to clean your kitchen when it doesn’t understand the physics of its surroundings. How can the machine generate a practical multistep plan to ensure the room is spotless? Large language models (LLMs) can get them close, but if the model is only trained on text, it’s likely to miss out on key specifics about the robot’s physical constraints, like how far it can reach or whether there are nearby obstacles to avoid. Stick to LLMs alone, and you’re likely to end up cleaning pasta stains out of your floorboards.

To guide robots in executing these open-ended tasks, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) used vision models to see what’s near the machine and model its constraints. The team’s strategy involves an LLM sketching up a plan that’s checked in a simulator to ensure it’s safe and realistic. If that sequence of actions is infeasible, the language model will generate a new plan, until it arrives at one that the robot can execute.

This trial-and-error method, which the researchers call “Planning for Robots via Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-horizon plans to ensure they satisfy all constraints, and enables a robot to perform such diverse tasks as writing individual letters, drawing a star, and sorting and placing blocks in different positions. In the future, PRoC3S could help robots complete more intricate chores in dynamic environments like houses, where they may be prompted to do a general chore composed of many steps (like “make me breakfast”).

“LLMs and classical robotics systems like task and motion planners can’t execute these kinds of tasks on their own, but together, their synergy makes open-ended problem-solving possible,” says PhD student Nishanth Kumar SM ’24, co-lead author of a new paper about PRoC3S. “We’re creating a simulation on-the-fly of what’s around the robot and trying out many possible action plans. Vision models help us create a very realistic digital world that enables the robot to reason about feasible actions for each step of a long-horizon plan.”

The team’s work was presented this past month in a paper shown at the Conference on Robot Learning (CoRL) in Munich, Germany.

The researchers’ method uses an LLM pre-trained on text from across the internet. Before asking PRoC3S to do a task, the team provided their language model with a sample task (like drawing a square) that’s related to the target one (drawing a star). The sample task includes a description of the activity, a long-horizon plan, and relevant details about the robot’s environment.

But how did these plans fare in practice? In simulations, PRoC3S successfully drew stars and letters eight out of 10 times each. It also could stack digital blocks in pyramids and lines, and place items with accuracy, like fruits on a plate. Across each of these digital demos, the CSAIL method completed the requested task more consistently than comparable approaches like “LLM3” and “Code as Policies”.

The CSAIL engineers next brought their approach to the real world. Their method developed and executed plans on a robotic arm, teaching it to put blocks in straight lines. PRoC3S also enabled the machine to place blue and red blocks into matching bowls and move all objects near the center of a table.

Kumar and co-lead author Aidan Curtis SM ’23, who’s also a PhD student working in CSAIL, say these findings indicate how an LLM can develop safer plans that humans can trust to work in practice. The researchers envision a home robot that can be given a more general request (like “bring me some chips”) and reliably figure out the specific steps needed to execute it. PRoC3S could help a robot test out plans in an identical digital environment to find a working course of action — and more importantly, bring you a tasty snack.

For future work, the researchers aim to improve results using a more advanced physics simulator and to expand to more elaborate longer-horizon tasks via more scalable data-search techniques. Moreover, they plan to apply PRoC3S to mobile robots such as a quadruped for tasks that include walking and scanning surroundings.

“Using foundation models like ChatGPT to control robot actions can lead to unsafe or incorrect behaviors due to hallucinations,” says The AI Institute researcher Eric Rosen, who isn’t involved in the research. “PRoC3S tackles this issue by leveraging foundation models for high-level task guidance, while employing AI techniques that explicitly reason about the world to ensure verifiably safe and correct actions. This combination of planning-based and data-driven approaches may be key to developing robots capable of understanding and reliably performing a broader range of tasks than currently possible.”

Kumar and Curtis’ co-authors are also CSAIL affiliates: MIT undergraduate researcher Jing Cao and MIT Department of Electrical Engineering and Computer Science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Their work was supported, in part, by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, MIT Quest for Intelligence, and The AI Institute.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人 人工智能 大型语言模型 计算机视觉 任务规划
相关文章