TechCrunch News 01月07日
Nvidia releases its own brand of world models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Nvidia推出Cosmos World Foundation Models,可预测和生成‘物理感知’视频,模型可针对特定应用微调,有多种类别,在多个领域有应用,但其并非严格意义上的开源。

🎯Nvidia推出Cosmos WFM,可生成‘物理感知’视频,从其API等获取。

📋Cosmos WFM家族模型分Nano、Super、Ultra三类,参数不同。

💡Nvidia还发布相关模型,用于多种应用,训练数据来源引争议。

🤝多家公司已承诺试用Cosmos WFM用于不同场景。

Nvidia is getting into world models — AI models that take inspiration from the mental models of the world that humans develop naturally. 

At the Consumer Electronics Show in Las Vegas, the company announced that it is making openly available a family of world models that can predict and generate “physics-aware” videos. Nvidia’s calling this family Cosmos World Foundation Models, or Cosmos WFM for short.

The models, which can be fine-tuned for specific applications, are available from Nvidia’s API and NGC catalogs and the AI developer platform Hugging Face.

“Nvidia is making available the first wave of Cosmos WFMs for physics-based simulation and synthetic data generation,” the company wrote in a blog post provided to TechCrunch. “Researchers and developers, regardless of their company size, can freely use the Cosmos models under Nvidia’s permissive open model license that allows commercial usage.”

Image Credits:Nvidia

There are a number of models in the Cosmos WFM family, divided into three categories: Nano for low latency and real-time applications; Super for “highly performant baseline” models; and Ultra for maximum quality and fidelity output.

The models range in size from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra being the largest. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

As a part of Cosmos WFM, Nvidia is also releasing an “upsampling model,” a video decoder optimized for augmented reality, and guardrail models to ensure responsible use, as well as fine-tuned models for applications like generating sensor data for autonomous vehicle development. These, as well as the other Cosmos WFM models, were trained on 9,000 trillion tokens from 20 million hours of real-world human interactions, environment, industrial, robotics, and driving data, Nvidia claimed. (In AI, “tokens” represent bits of raw data — in this case, video footage.)

Nvidia wouldn’t say where this training data came from, but at least one report — and lawsuitalleges that the company trained on copyrighted YouTube videos without permission. We’ve reached out to Nvidia’s press team for comment and will update this piece if we hear back.

Nvidia claimed that Cosmos WFM models, given text or video frames, can generate “controllable, high-quality” synthetic data to bootstrap the training of models for robotics, driverless cars, and more.

Image Credits:Nvidia

“Nvidia Cosmos’ suite of open models means developers can customize the WFMs with data sets, such as video recordings of autonomous vehicle trips or robots navigating a warehouse, according to the needs of their target application,” Nvidia wrote in a press release. “Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data.”

Nvidia said that companies including Waabi, Wayve, Fortellix, and Uber have already committed to piloting Cosmos WFMs for various use cases, from video search and curation to building AI models for self-driving vehicles.

Important to note is that Nvidia’s world models aren’t “open source” in the strictest sense. To abide by one widely accepted definition of “open source” AI, an AI model has to provide enough information about its design so that a person could “substantially” recreate it, and disclose any pertinent details about its training data, including the provenance and how the data can be obtained or licensed.

Nvidia hasn’t published Cosmos WFM training data details, nor has it made available all the tools needed to recreate the models from scratch. That’s probably why the tech giant is referring to the models as “open” as opposed to open source.

LIVE 4 mins ago

CES 2025, the annual consumer tech conference held in Las Vegas, is upon us. Over the next few days, TechCrunch…

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Nvidia Cosmos WFM 物理感知视频 模型应用
相关文章