MIT News - Machine learning 04月24日 12:19
Robotic system zeroes in on objects most relevant for helping humans
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIT的研究人员开发了一种名为“Relevance”的新方法,使机器人能够更有效地与人类互动。该方法通过分析环境中的音频和视觉信息,快速确定人类的目标,并识别出最相关的物品。实验表明,该机器人能够准确预测人类的目标并提供帮助,减少了碰撞,从而实现了更安全、更高效的人机协作。研究人员希望将该技术应用于智能制造和仓储等领域,实现更自然、流畅的人机交互。

🤖️ **核心理念:** “Relevance”方法的核心在于模仿人脑中的“Reticular Activating System (RAS)”,通过过滤无关信息,帮助机器人专注于场景中与人类目标最相关的部分。

👂 **信息获取:** 机器人通过麦克风和摄像头获取音频和视觉信息,并将其输入到AI工具包中,包括大型语言模型(LLM),用于处理音频对话以识别关键词和短语,以及各种用于检测和分类物体、人类、物理动作和任务目标的算法。

☕️ **目标判断:** 系统会进行“触发检查”,以确定是否有重要事件发生。一旦检测到人类,机器人会启动“Relevance”阶段,利用AI工具包的预测来确定与人类目标最相关的环境特征,例如,当检测到“咖啡”时,机器人会优先考虑“杯子”和“奶精”。

🤝 **实际应用:** 研究人员在模拟会议早餐自助餐的实验中测试了该系统。机器人能够根据人类的动作和对话,准确识别目标并提供帮助,例如递给人们牛奶和搅拌棒。该方法还提高了机器人的安全性,减少了碰撞。

💡 **未来展望:** 研究团队希望将该系统应用于工作场所和仓库环境,以及家庭环境中,实现更自然、流畅的人机交互,例如,在阅读时递送咖啡,在洗衣时提供洗衣液等。

For a robot, the real world is a lot to take in. Making sense of every data point in a scene can take a huge amount of computational effort and time. Using that information to then decide how to best help a human is an even thornier exercise.

Now, MIT roboticists have a way to cut through the data noise, to help robots focus on the features in a scene that are most relevant for assisting humans.

Their approach, which they aptly dub “Relevance,” enables a robot to use cues in a scene, such as audio and visual information, to determine a human’s objective and then quickly identify the objects that are most likely to be relevant in fulfilling that objective. The robot then carries out a set of maneuvers to safely offer the relevant objects or actions to the human.

The researchers demonstrated the approach with an experiment that simulated a conference breakfast buffet. They set up a table with various fruits, drinks, snacks, and tableware, along with a robotic arm outfitted with a microphone and camera. Applying the new Relevance approach, they showed that the robot was able to correctly identify a human’s objective and appropriately assist them in different scenarios.

In one case, the robot took in visual cues of a human reaching for a can of prepared coffee, and quickly handed the person milk and a stir stick. In another scenario, the robot picked up on a conversation between two people talking about coffee, and offered them a can of coffee and creamer.

Overall, the robot was able to predict a human’s objective with 90 percent accuracy and to identify relevant objects with 96 percent accuracy. The method also improved a robot’s safety, reducing the number of collisions by more than 60 percent, compared to carrying out the same tasks without applying the new method.

“This approach of enabling relevance could make it much easier for a robot to interact with humans,” says Kamal Youcef-Toumi, professor of mechanical engineering at MIT. “A robot wouldn’t have to ask a human so many questions about what they need. It would just actively take information from the scene to figure out how to help.”

Youcef-Toumi’s group is exploring how robots programmed with Relevance can help in smart manufacturing and warehouse settings, where they envision robots working alongside and intuitively assisting humans.

Youcef-Toumi, along with graduate students Xiaotong Zhang and Dingcheng Huang, will present their new method at the IEEE International Conference on Robotics and Automation (ICRA) in May. The work builds on another paper presented at ICRA the previous year.

Finding focus

The team’s approach is inspired by our own ability to gauge what’s relevant in daily life. Humans can filter out distractions and focus on what’s important, thanks to a region of the brain known as the Reticular Activating System (RAS). The RAS is a bundle of neurons in the brainstem that acts subconsciously to prune away unnecessary stimuli, so that a person can consciously perceive the relevant stimuli. The RAS helps to prevent sensory overload, keeping us, for example, from fixating on every single item on a kitchen counter, and instead helping us to focus on pouring a cup of coffee.

“The amazing thing is, these groups of neurons filter everything that is not important, and then it has the brain focus on what is relevant at the time,” Youcef-Toumi explains. “That’s basically what our proposition is.”

He and his team developed a robotic system that broadly mimics the RAS’s ability to selectively process and filter information. The approach consists of four main phases. The first is a watch-and-learn “perception” stage, during which a robot takes in audio and visual cues, for instance from a microphone and camera, that are continuously fed into an AI “toolkit.” This toolkit can include a large language model (LLM) that processes audio conversations to identify keywords and phrases, and various algorithms that detect and classify objects, humans, physical actions, and task objectives. The AI toolkit is designed to run continuously in the background, similarly to the subconscious filtering that the brain’s RAS performs.

The second stage is a “trigger check” phase, which is a periodic check that the system performs to assess if anything important is happening, such as whether a human is present or not. If a human has stepped into the environment, the system’s third phase will kick in. This phase is the heart of the team’s system, which acts to determine the features in the environment that are most likely relevant to assist the human.

To establish relevance, the researchers developed an algorithm that takes in real-time predictions made by the AI toolkit. For instance, the toolkit’s LLM may pick up the keyword “coffee,” and an action-classifying algorithm may label a person reaching for a cup as having the objective of “making coffee.” The team’s Relevance method would factor in this information to first determine the “class” of objects that have the highest probability of being relevant to the objective of “making coffee.” This might automatically filter out classes such as “fruits” and “snacks,” in favor of “cups” and “creamers.” The algorithm would then further filter within the relevant classes to determine the most relevant “elements.” For instance, based on visual cues of the environment, the system may label a cup closest to a person as more relevant — and helpful — than a cup that is farther away.

In the fourth and final phase, the robot would then take the identified relevant objects and plan a path to physically access and offer the objects to the human.

Helper mode

The researchers tested the new system in experiments that simulate a conference breakfast buffet. They chose this scenario based on the publicly available Breakfast Actions Dataset, which comprises videos and images of typical activities that people perform during breakfast time, such as preparing coffee, cooking pancakes, making cereal, and frying eggs. Actions in each video and image are labeled, along with the overall objective (frying eggs, versus making coffee).

Using this dataset, the team tested various algorithms in their AI toolkit, such that, when receiving actions of a person in a new scene, the algorithms could accurately label and classify the human tasks and objectives, and the associated relevant objects.

In their experiments, they set up a robotic arm and gripper and instructed the system to assist humans as they approached a table filled with various drinks, snacks, and tableware. They found that when no humans were present, the robot’s AI toolkit operated continuously in the background, labeling and classifying objects on the table.

When, during a trigger check, the robot detected a human, it snapped to attention, turning on its Relevance phase and quickly identifying objects in the scene that were most likely to be relevant, based on the human’s objective, which was determined by the AI toolkit.

“Relevance can guide the robot to generate seamless, intelligent, safe, and efficient assistance in a highly dynamic environment,” says co-author Zhang.

Going forward, the team hopes to apply the system to scenarios that resemble workplace and warehouse environments, as well as to other tasks and objectives typically performed in household settings.

“I would want to test this system in my home to see, for instance, if I’m reading the paper, maybe it can bring me coffee. If I’m doing laundry, it can bring me a laundry pod. If I’m doing repair, it can bring me a screwdriver,” Zhang says. “Our vision is to enable human-robot interactions that can be much more natural and fluent.”

This research was made possible by the support and partnership of King Abdulaziz City for Science and Technology (KACST) through the Center for Complex Engineering Systems at MIT and KACST.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人 人工智能 人机交互 MIT
相关文章