When Daniela Rus and her collaborators looked at how a deep neural network made decisions in the vision system of their laboratory’s self-driving car, they noticed that its attention was focused on the entire image, even the bushes and trees at the side of the road. “But that’s not how people drive,” said Rus in her office in the Massachusetts Institute of Technology (MIT)’s Computer Science and Artificial Intelligence Laboratory (CSAIL), of which she is the director. “We usually look at the road horizon and the sides of the road.”
Traditionally AI and robotics have been largely two separate fields, Rus explained. “AI has been amazing us with its decision-making and reasoning, but it is confined in the digital space. Robots have physical presence but are generally pre-programmed and not intelligent. We are aiming to bridge the separation between AI and robots by developing what I call ‘physical AI’. Physical AI uses AI’s power to understand text, images, and video to make a real-world machine smarter. And those machines can be any physical platform: a sensor, a robot, or a power grid.”
Trying to adopt current AI solutions for robots leads to huge challenges in terms of power consumption, computing power, and data exchange. AI solutions typically require huge server farms that do not fit on the bodies of robots, and a safety-critical system can’t rely on cloud connections. Furthermore, AI sometimes still makes silly mistakes that are unacceptable in safety-critical tasks.
Rus offered the example of pedestrian detection by self-driving cars: “Although today’s AI is very good at detecting individual pedestrians, it is not so good at detecting groups of pedestrians, because they have an amorphous, not clearly defined shape.”
Another problem is that current transformer-based AI-models rely on next token prediction based on identifying statistical patterns in the data, but they lack a deeper understanding of the causal relationships that underlie those patterns. Explained Rus, “If you have a model that correlates fire with heat, that model does not inherently understand the physical processes of combustion and fire. We really need grounding in physical, causal, and temporal realities, otherwise the AI models struggle to make sensible predictions about the real world.”
To tackle those challenges, Rus and her team have developed what they call “liquid networks,” which she described as “a physics-based technology for neural networks whose mathematical equations are inspired by what neuroscientists know about the nematode C. elegans, a one-millimeter-long worm which has a good life with only 302 neurons.
“Unlike for the traditional artificial neuron, the output of a liquid network neuron is not a binary number, 0 or 1, but it is given by a function governed by a differential equation. Furthermore, the connections between the neurons in a liquid network are more than the simple weights in traditional neural networks; they also are governed by functions inspired by neuroscience. In addition, we also change the architecture of the network so it is not a feed-forward architecture like in transformer models, but it includes recurrences which support adaptation.”
These differences allow the MIT team to prove that liquid networks are causal, meaning they learn to associate cause and effect. Moreover, liquid networks are compact and can be trained efficiently. They are also efficient at performing inferences. These are all properties that make them suitable for real-world applications such as robots.
When Rus and her colleagues swapped the deep neural network, which contained tens of thousands of neurons, for their newly developed liquid network in their self-driving car experiment, they required only 90 liquid neurons. Furthermore, the attention of the liquid network was focused on the road horizon and the sides of the road, just like human drivers. Said Rus, “It looks like these liquid networks learn the task rather than the context of the task. We are now working to mathematically characterize this.”
A second practical example showed the benefit of liquid networks for robots. This was a drone experiment in which the drone had to find red objects in a forest. Rus and her team trained three different models to do this task: recurrent neural networks, deep neural networks, and liquid networks. Rus said the researchers showed each model “unlabeled videos that were all shot in summer. All the models learned to find the objects in the real forest when it was summer. But when the background changed in autumn and winter, when the leaves turned brown and later fell off, only the liquid network managed to find the objects in the forest. The other models got confused by the new background.”
She added, “Our solution works even in an urban environment, although it was not explicitly trained for this.”
Another research line that Rus is excited about to make robots smarter was inspired by the development of large language models in recent years. “We are bringing language into the control loop of a robot,” she said. “Language is so important for intelligence. It helps us share knowledge. It helps us reason at higher levels of abstraction.”
In one of their recent projects, her team developed a foundational model for driving that connects language and images in the same latent representation. They then trained self-driving cars to avoid deer in the summer in the woods.
Said Rus, “After training the car with videos, it had learned to avoid deer. Then we could ask it in words to also avoid sheep, bicyclists, trees, people, etc., without giving it additional videos for training. It was able to do so by connecting the text concept with the visual representation of that concept. So, language allows us to increase the capabilities of the car and also elevated the level of abstraction at which we were able to talk with the car.”
Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.