Communications of the ACM - Artificial Intelligence 04月24日 21:48
How Liquid Networks Make Robots Smarter
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

麻省理工学院(MIT)的Daniela Rus及其团队正在研发一种名为“物理AI”的新型人工智能技术,旨在弥合AI与机器人之间的差距。这项技术的核心在于“液态网络”,它借鉴了神经科学的原理,能够使机器人更有效地理解文本、图像和视频,从而在真实世界中做出更智能的决策。与传统的AI模型相比,液态网络更紧凑、更高效,并且能够学习因果关系,使其更适用于机器人应用。研究团队通过实验证明,液态网络在自动驾驶和无人机任务中表现出色,即使在环境变化的情况下也能保持稳定。此外,研究人员还将语言引入机器人的控制循环,通过将语言和图像连接起来,使机器人能够理解和执行更复杂的指令,例如识别和避开不同的物体。

🧠 Rus团队开发的“物理AI”旨在融合AI与机器人,利用AI理解能力提升机器人智能。

💡 “液态网络”是物理AI的核心技术,其设计灵感来源于神经科学,具有体积小、效率高、学习因果关系的特点。

🚗 在自动驾驶实验中,液态网络仅需90个神经元,就能像人类驾驶员一样关注道路,而传统深度神经网络需要数万个神经元。

🌳 在无人机实验中,液态网络在不同季节的环境变化下,依然能够准确识别目标,而其他模型则会混淆。

🗣️ 研究团队将语言融入机器人控制,通过将语言与图像联系起来,使机器人能够理解更抽象的指令,例如识别和避开特定物体。

When Daniela Rus and her collaborators looked at how a deep neural network made decisions in the vision system of their laboratory’s self-driving car, they noticed that its attention was focused on the entire image, even the bushes and trees at the side of the road. “But that’s not how people drive,” said Rus in her office in the Massachusetts Institute of Technology (MIT)’s Computer Science and Artificial Intelligence Laboratory (CSAIL), of which she is the director. “We usually look at the road horizon and the sides of the road.”

Traditionally AI and robotics have been largely two separate fields, Rus explained. “AI has been amazing us with its decision-making and reasoning, but it is confined in the digital space. Robots have physical presence but are generally pre-programmed and not intelligent. We are aiming to bridge the separation between AI and robots by developing what I call ‘physical AI’. Physical AI uses AI’s power to understand text, images, and video to make a real-world machine smarter. And those machines can be any physical platform: a sensor, a robot, or a power grid.”

Trying to adopt current AI solutions for robots leads to huge challenges in terms of power consumption, computing power, and data exchange. AI solutions typically require huge server farms that do not fit on the bodies of robots, and a safety-critical system can’t rely on cloud connections. Furthermore, AI sometimes still makes silly mistakes that are unacceptable in safety-critical tasks.

Rus offered the example of pedestrian detection by self-driving cars: “Although today’s AI is very good at detecting individual pedestrians, it is not so good at detecting groups of pedestrians, because they have an amorphous, not clearly defined shape.”

Another problem is that current transformer-based AI-models rely on next token prediction based on identifying statistical patterns in the data, but they lack a deeper understanding of the causal relationships that underlie those patterns. Explained Rus, “If you have a model that correlates fire with heat, that model does not inherently understand the physical processes of combustion and fire. We really need grounding in physical, causal, and temporal realities, otherwise the AI models struggle to make sensible predictions about the real world.”

To tackle those challenges, Rus and her team have developed what they call “liquid networks,” which she described as “a physics-based technology for neural networks whose mathematical equations are inspired by what neuroscientists know about the nematode C. elegans, a one-millimeter-long worm which has a good life with only 302 neurons.

“Unlike for the traditional artificial neuron, the output of a liquid network neuron is not a binary number, 0 or 1, but it is given by a function governed by a differential equation. Furthermore, the connections between the neurons in a liquid network are more than the simple weights in traditional neural networks; they also are governed by functions inspired by neuroscience. In addition, we also change the architecture of the network so it is not a feed-forward architecture like in transformer models, but it includes recurrences which support adaptation.”

These differences allow the MIT team to prove that liquid networks are causal, meaning they learn to associate cause and effect. Moreover, liquid networks are compact and can be trained efficiently. They are also efficient at performing inferences. These are all properties that make them suitable for real-world applications such as robots.

When Rus and her colleagues swapped the deep neural network, which contained tens of thousands of neurons, for their newly developed liquid network in their self-driving car experiment, they required only 90 liquid neurons. Furthermore, the attention of the liquid network was focused on the road horizon and the sides of the road, just like human drivers. Said Rus, “It looks like these liquid networks learn the task rather than the context of the task. We are now working to mathematically characterize this.”

A second practical example showed the benefit of liquid networks for robots. This was a drone experiment in which the drone had to find red objects in a forest. Rus and her team trained three different models to do this task: recurrent neural networks, deep neural networks, and liquid networks. Rus said the researchers showed each model “unlabeled videos that were all shot in summer. All the models learned to find the objects in the real forest when it was summer. But when the background changed in autumn and winter, when the leaves turned brown and later fell off, only the liquid network managed to find the objects in the forest. The other models got confused by the new background.”

She added, “Our solution works even in an urban environment, although it was not explicitly trained for this.”

Another research line that Rus is excited about to make robots smarter was inspired by the development of large language models in recent years. “We are bringing language into the control loop of a robot,” she said. “Language is so important for intelligence. It helps us share knowledge. It helps us reason at higher levels of abstraction.”

 In one of their recent projects, her team developed a foundational model for driving that connects language and images in the same latent representation. They then trained self-driving cars to avoid deer in the summer in the woods.

Said Rus, “After training the car with videos, it had learned to avoid deer. Then we could ask it in words to also avoid sheep, bicyclists, trees, people, etc., without giving it additional videos for training. It was able to do so by connecting the text concept with the visual representation of that concept. So, language allows us to increase the capabilities of the car and also elevated the level of abstraction at which we were able to talk with the car.”

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

物理AI 液态网络 机器人 人工智能 MIT
相关文章