The Verge - Artificial Intelligences 2024年10月31日
Waymo explores using Google’s Gemini to train its robotaxis
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Waymo基于Google的多模态大语言模型Gemini开发新训练模型EMMA,用于处理传感器数据以生成自动驾驶车辆的未来轨迹,帮助其无人驾驶车辆做决策。这是自动驾驶领域使用MLLM的首次尝试,该模型在一些方面表现出色,但也存在局限性,需进一步研究。

🥇Waymo利用Google的多模态大语言模型Gemini开发EMMA,这是一种端到端的多模态模型,用于处理传感器数据,为自动驾驶车辆生成未来轨迹,帮助车辆做出行驶决策。

🚗EMMA在轨迹预测、物体检测和道路图理解方面表现出色,能帮助无人驾驶汽车在复杂环境中找到正确路线,如应对路上出现的各种动物或施工情况。

⚠️EMMA存在一些局限性,如无法整合lidar或radar的3D传感器输入,且一次只能处理少量图像帧,使用MLLM训练机器人出租车还存在未提及的风险。

Cath Virginia / The Verge | Photo from Getty Images

Waymo has long touted its ties to Google’s DeepMind and its decades of AI research as a strategic advantage over its rivals in the autonomous driving space. Now, the Alphabet-owned company is taking it a step further by developing a new training model for its robotaxis built on Google’s multimodal large language model (MLLM) Gemini.

Waymo released a new research paper today that introduces an “End-to-End Multimodal Model for Autonomous Driving,” also known as EMMA. This new end-to-end training model processes sensor data to generate “future trajectories for autonomous vehicles,” helping Waymo’s driverless vehicles make decisions about where to go and how to avoid obstacles.

But more importantly, this is one of the first indications that the leader in autonomous driving has designs to use MLLMs in its operations. And it’s a sign that these LLMs could break free of their current use as chatbots, email organizers, and image generators and find application in an entirely new environment on the road. In its research paper, Waymo is proposing “to develop an autonomous driving system in which the MLLM is a first class citizen.”

The paper outlines how, historically, autonomous driving systems have developed specific “modules” for the various functions, including perception, mapping, prediction, and planning. This approach has proven useful for many years but has problems scaling “due to the accumulated errors among modules and limited inter-module communication.” Moreover, these modules could struggle to respond to “novel environments” because, by nature, they are “pre-defined,” which can make it hard to adapt.

Waymo says that MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: the chat is a “generalist” trained on vast sets of scraped data from the internet “that provide rich ‘world knowledge’ beyond what is contained in common driving logs”; and they demonstrate “superior” reasoning capabilities through techniques like “chain-of-thought reasoning,” which mimics human reasoning by breaking down complex tasks into a series of logical steps.

Screenshot: Waymo
Waymo’s EMMA model.

Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its driverless cars find the right route, including encountering various animals or construction in the road.

Other companies, like Tesla, have spoken extensively about developing end-to-end models for their autonomous cars. Elon Musk claims that the latest version of its Full Self-Driving system (12.5.5) uses an “end-to-end neural nets” AI system that translates camera images into driving decisions.

This is a clear indication that Waymo, which has a lead on Tesla in deploying real driverless vehicles on the road, is also interested in pursuing an end-to-end system. The company said that its EMMA model excelled at trajectory prediction, object detection, and road graph understanding.

“This suggests a promising avenue of future research, where even more core autonomous driving tasks could be combined in a similar, scaled-up setup,” the company said in a blog post today.

But EMMA also has its limitations, and Waymo acknowledges that there will need to be future research before the model is put into practice. For example, EMMA couldn’t incorporate 3D sensor inputs from lidar or radar, which Waymo said was “computationally expensive.” And it could only process a small amount of image frames at a time.

There are also risks to using MLLMs to train robotaxis that go unmentioned in the research paper. Chatbots like Gemini often hallucinate or fail at simple tasks like reading clocks or counting objects. Waymo has very little margin for error when its autonomous vehicles are traveling 40mph down a busy road. More research will be needed before these models can be deployed at scale — and Waymo is clear about that.

“We hope that our results will inspire further research to mitigate these issues,” the company’s research team writes, “and to further evolve the state of the art in autonomous driving model architectures.”

Emma Paper by ahawkins8223 on Scribd

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Waymo 自动驾驶 EMMA 多模态大语言模型
相关文章