Top AI researchers say language is limiting. Here's the new kind of model they are building instead.

Fei-Fei Li, a pioneer in AI research, is working to develop a "world" model, which trains on data beyond just language.
Greg Sandoval/Business Insider

Top AI researchers like Fei-Fei Li and Yann LeCun are thinking about AI beyond LLMs.At World Labs, Li is focused on building world models. LeCun is building them at Meta.World models mimic the mental constructs humans make in their heads.

As OpenAI, Anthropic, and Big Tech invest billions in developing state-of-the-art large-language models, a small group of AI researchers is working on the next big thing.

Computer scientists like Fei-Fei Li, the Stanford professor famous for inventing ImageNet, and Yann LeCun, Meta's chief AI scientist, are building what they call "world models."

Unlike large-language models, which determine outputs based on statistical relationships between the words and phrases in their training data, world models predict events based on the mental constructs that humans make of the world around them.

"Language doesn't exist in nature," Li said on a recent episode of Andreessen Horowitz's a16z podcast. "Humans," she said, "not only do we survive, live, and work, but we build civilization beyond language."

Computer scientist and MIT professor, Jay Wright Forrester, in his 1971 paper "Counterintuitive Behavior of Social Systems," explained why mental models are crucial to human behavior:

Each of us uses models constantly. Every person in private life and in business instinctively uses models for decision making. The mental images in one's head about one's surroundings are models. One's head does not contain real families, businesses, cities, governments, or countries. One uses selected concepts and relationships to represent real systems. A mental image is a model. All decisions are taken on the basis of models. All laws are passed on the basis of models. All executive actions are taken on the basis of models. The question is not to use or ignore models. The question is only a choice among alternative models.

If AI is to meet or surpass human intelligence, then the researchers behind it believe it should be able to make mental models, too.

Li has been working on this through World Labs, which she cofounded in 2024 with an initial backing of $230 million from venture firms like Andreessen Horowitz, New Enterprise Associates, and Radical Ventures. "We aim to lift AI models from the 2D plane of pixels to full 3D worlds — both virtual and real — endowing them with spatial intelligence as rich as our own," World Labs says on its website.

Li said on the No Priors podcast that spatial intelligence is "the ability to understand, reason, interact, and generate 3D worlds," given that the world is fundamentally three-dimensional.

Li said she sees applications for world models in creative fields, robotics, or any area that warrants infinite universes. Like Meta, Anduril, and other Silicon Valley heavyweights, that could mean advances in military applications by helping those on the battlefield better perceive their surroundings and anticipate their enemies' next moves.

The challenge of building world models is the paucity of sufficient data. In contrast to language, which humans have refined and documented over centuries, spatial intelligence is less developed.

"If I ask you to close your eyes right now and draw out or build a 3D model of the environment around you, it's not that easy," she said on the No Priors podcast. "We don't have that much capability to generate extremely complicated models till we get trained."

To gather the data necessary for these models, "we require more and more sophisticated data engineering, data acquisition, data processing, and data synthesis," she said.

That makes the challenge of building a believable world even greater.

At Meta, chief AI scientist Yann LeCun has a small team dedicated to a similar project. The team uses video data to train models and runs simulations that abstract the videos at different levels.

"The basic idea is that you don't predict at the pixel level. You train a system to run an abstract representation of the video so that you can make predictions in that abstract representation, and hopefully this representation will eliminate all the details that cannot be predicted," he said at the AI Action Summit in Paris earlier this year.

That creates a simpler set of building blocks for mapping out trajectories for how the world will change at a particular time.

LeCun, like Li, believes these models are the only way to create truly intelligent AI.

"We need AI systems that can learn new tasks really quickly," he said recently at the National University of Singapore. "They need to understand the physical world — not just text and language but the real world — have some level of common sense, and abilities to reason and plan, have persistent memory — all the stuff that we expect from intelligent entities."

Read the original article on Business Insider

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签