TechCrunch News 02月08日
Tesla’s Dojo, a timeline
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

特斯拉致力于成为一家AI公司,其全自动驾驶(FSD)技术的实现离不开Dojo超算。尽管FSD目前仍需人工监控,但特斯拉坚信通过更多数据和计算能力,能实现完全自动驾驶。Dojo的研发历程备受关注,从最初的构想到实际部署,经历了多次调整和演变。随着Cortex超算的出现,特斯拉在AI领域的投入持续增加,旨在加速FSD的迭代和优化。然而,Dojo项目也面临着风险和挑战,包括技术难题和高昂的成本。特斯拉能否通过Dojo实现其AI愿景,仍有待观察。

🤖 **Dojo超算的核心目标:** 特斯拉希望通过Dojo超算来训练其全自动驾驶(FSD)神经网络,最终实现完全自动驾驶。马斯克认为,Dojo能够处理大量的视频训练数据,并高效运行具有大量参数、充足内存和超高带宽的超空间阵列。

📈 **Dojo的研发历程与时间线:** 从2020年首次公开Dojo计划,到2024年底,特斯拉在Dojo的研发上投入了大量资源。期间经历了多次技术迭代和目标调整,包括D1芯片的发布、Dojo机柜的安装以及Exapod集群的规划。然而,2024年底Dojo并未如期完成全部计划。

💰 **Dojo与Nvidia的竞争与合作:** 特斯拉在AI计算方面采取了“双轨”策略,即同时发展Dojo和使用Nvidia的GPU。尽管马斯克曾表示Dojo有潜力与Nvidia竞争,但他也承认Nvidia硬件在AI训练超集群的成本中占据很大比例。特斯拉在2024年预计在Nvidia硬件上的支出将达到30亿至40亿美元。

🏭 **Dojo的部署与扩展:** 特斯拉计划在Giga Texas工厂扩建部分建设“超密集、水冷超级计算机集群”,并已完成Cortex超算的部署,该超算由约50,000个H100 Nvidia GPU组成。此外,特斯拉还计划在Buffalo建设Dojo超级计算机。

⚠️ **Dojo面临的挑战与风险:** 马斯克承认Dojo是一个高风险、高回报的项目。特斯拉在AI训练计算方面面临着数据管理、硬件获取以及技术实现等多重挑战。此外,Dojo的巨额投资也可能对特斯拉的财务状况造成压力。

Elon Musk doesn’t want Tesla to be just an automaker. He wants Tesla to be an AI company, one that’s figured out how to make cars drive themselves. 

Crucial to that mission is Dojo, Tesla’s custom-built supercomputer designed to train its Full Self-Driving (FSD) neural networks. FSD isn’t actually fully self-driving; it can perform some automated driving tasks, but still requires an attentive human behind the wheel. But Tesla thinks with more data, more compute power and more training, it can cross the threshold from almost self-driving to full self-driving. 

And that’s where Dojo comes in. 

Musk has been teasing Dojo for some time, but the executive ramped up discussions about the supercomputer throughout 2024. Now that we’re in 2025, another supercomputer called Cortex has entered the chat, but Dojo’s importance to Tesla might still be existential — with EV sales slumping, investors want assurances that Tesla can achieve autonomy. Below is a timeline of Dojo mentions and promises. 

April 22 – At Tesla’s Autonomy Day, the automaker had its AI team onstage to talk about Autopilot and Full Self-Driving, and the AI powering them both. The company shares information about Tesla’s custom-built chips that are designed specifically for neural networks and self-driving cars. 

During the event, Musk teases Dojo, revealing that it’s a supercomputer for training AI. He also notes that all Tesla cars being produced at the time would have all hardware necessary for full self-driving and only needed a software update.

Feb 2 – Musk says Tesla will soon have more than a million connected vehicles worldwide with sensors and compute needed for full self-driving — and touts Dojo’s capabilities. 

“Dojo, our training supercomputer, will be able to process vast amounts of video training data & efficiently run hyperspace arrays with a vast number of parameters, plenty of memory & ultra-high bandwidth between cores. More on this later.”

August 14 Musk reiterates Tesla’s plan to develop a neural network training computer called Dojo “to process truly vast amounts of video data,” calling it “a beast.” He also says the first version of Dojo is “about a year away,” which would put its launch date somewhere around August 2021.

December 31 Elon says Dojo isn’t needed, but it will make self-driving better. “It isn’t enough to be safer than human drivers, Autopilot ultimately needs to be more than 10 times safer than human drivers.”

August 19 – The automaker officially announces Dojo at Tesla’s first AI Day, an event meant to attract engineers to Tesla’s AI team. Tesla also introduces its D1 chip, which the automaker says it will use — alongside Nvidia’s GPU — to power the Dojo supercomputer. Tesla notes its AI cluster will house 3,000 D1 chips. 

October 12 – Tesla releases a Dojo Technology whitepaper, “a guide to Tesla’s configurable floating point formats & arithmetic.” The whitepaper outlines a technical standard for a new type of binary floating-point arithmetic that’s used in deep learning neural networks and can be implemented “entirely in software, entirely in hardware, or in any combination of software and hardware.”

August 12 – Musk says Tesla will “phase in Dojo. Won’t need to buy as many incremental GPUs next year.”

September 30 – At Tesla’s second AI Day, the company reveals that it has installed the first Dojo cabinet, testing 2.2 megawatts of load testing. Tesla says it was building one tile per day (which is made up of 25 D1 chips). Tesla demos Dojo onstage running a Stable Diffusion model to create an AI-generated image of a “Cybertruck on Mars.”

Importantly, the company sets a target date of a full Exapod cluster to be completed by Q1 2023, and says it plans to build a total of seven Exapods in Palo Alto. 

April 19 – Musk tells investors during Tesla’s first-quarter earnings that Dojo “has the potential for an order of magnitude improvement in the cost of training,” and also “has the potential to become a sellable service that we would offer to other companies in the same way that Amazon Web Services offers web services.”

Musk also notes that he’d “look at Dojo as kind of a long-shot bet,” but a “bet worth making.”

June 21 The Tesla AI X account posts that the company’s neural networks are already in customer vehicles. The thread includes a graph with a timeline of Tesla’s current and projected compute power, which places the start of Dojo production at July 2023, although it’s not clear if this refers to the D1 chips or the supercomputer itself. Musk says that same day that Dojo was already online and running tasks at Tesla data centers. 

The company also projects that Tesla’s compute will be the top five in the entire world by around February 2024 (there are no indications this was successful) and that Tesla would reach 100 exaflops by October 2024.

July 19 – Tesla notes in its second-quarter earnings report it has started production of Dojo. Musk also says Tesla plans to spend more than $1 billion on Dojo through 2024.  

September 6 – Musk posts on X that Tesla is limited by AI training compute, but that Nvidia and Dojo will fix that. He says managing the data from the roughly 160 billion frames of video Tesla gets from its cars per day is extremely difficult. 

January 24 – During Tesla’s fourth-quarter and full-year earnings call, Musk acknowledges again that Dojo is a high-risk, high-reward project. He also says that Tesla was pursuing “the dual path of Nvidia and Dojo,” that “Dojo is working” and is “doing training jobs.” He notes Tesla is scaling it up and has “plans for Dojo 1.5, Dojo 2, Dojo 3 and whatnot.”

January 26 – Tesla announced plans to spend $500 million to build a Dojo supercomputer in Buffalo. Musk then downplays the investment somewhat, posting on X that while $500 million is a large sum, it’s “only equivalent to a 10k H100 system from Nvidia. Tesla will spend more than that on Nvidia hardware this year. The table stakes for being competitive in AI are at least several billion dollars per year at this point.”

April 30 – At TSMC’s North American Technology Symposium, the company says Dojo’s next-generation training tile — the D2, which puts the entire Dojo tile onto a single silicon wafer, rather than connecting 25 chips to make one tile — is already in production, according to IEEE Spectrum

May 20 – Musk notes that the rear portion of the Giga Texas factory extension will include the construction of “a super dense, water-cooled supercomputer cluster.”

June 4 – A CNBC report reveals Musk diverted thousands of Nvidia chips reserved for Tesla to X and xAI. After initially saying the report was false, Musk posts on X that Tesla didn’t have a location to send the Nvidia chips to turn them on, due to the continued construction on the south extension of Giga Texas, “so they would have just sat in a warehouse.” He noted the extension will “house 50k H100s for FSD training.”   

He also posts

“Of the roughly $10B in AI-related expenditures I said Tesla would make this year, about half is internal, primarily the Tesla-designed AI inference computer and sensors present in all of our cars, plus Dojo. For building the AI training superclusters, NVidia hardware is about 2/3 of the cost. My current best guess for Nvidia purchases by Tesla are $3B to $4B this year.”

July 1 – Musk reveals on X that current Tesla vehicles may not have the right hardware for the company’s next-gen AI model. He says that the roughly 5x increase in parameter count with the next-gen AI “is very difficult to achieve without upgrading the vehicle inference computer.”

July 23 – During Tesla’s second-quarter earnings call, Musk says demand for Nvidia hardware is “so high that it’s often difficult to get the GPUs.” 

“I think this therefore requires that we put a lot more effort on Dojo in order to ensure that we’ve got the training capability that we need,” Musk says. “And we do see a path to being competitive with Nvidia with Dojo.”

A graph in Tesla’s investor deck predicts that Tesla AI training capacity will ramp to roughly 90,000 H100 equivalent GPUs by the end of 2024, up from around 40,000 in June. Later that day on X, Musk posts that Dojo 1 will have “roughly 8k H100-equivalent of training online by end of year.” He also posts photos of the supercomputer, which appears to use the same fridge-like stainless steel exterior as Tesla’s Cybertrucks. 

July 30 –  AI5 is ~18 months away from high-volume production, Musk says in a reply to a post from someone claiming to start a club of “Tesla HW4/AI4 owners angry about getting left behind when AI5 comes out.” 

August 3 – Musk posts on X that he did a walkthrough of “the Tesla supercompute cluster at Giga Texas (aka Cortex).” He notes that it would be made roughly of 100,000 H100/H200 Nvidia GPUs with “massive storage for video training of FSD & Optimus.”

August 26 – Musk posts on X a video of Cortex, which he refers to as “the giant new AI training supercluster being built at Tesla HQ in Austin to solve real-world AI.” 

January 29 – Tesla’s Q4 and full-year 2024 earnings call included no mention of Dojo. Cortex, Tesla’s new AI training supercluster at the Austin gigafactory, did make an appearance, however. Tesla noted in its shareholder deck that it completed the deployment of Cortex, which is made up of roughly 50,000 H100 Nvidia GPUs. 

“Cortex helped enable V13 of FSD (Supervised), which boasts major improvements in safety and comfort thanks to 4.2x increase in data, higher resolution video inputs … among other enhancements,” according to the letter. 

During the call, CFO Vaibhav Taneja noted that Tesla accelerated the buildout of Cortex to speed up the rollout of FSD V13. He said that accumulated AI-related capital expenditures, including infrastructure, “so far has been approximately $5 billion.” In 2025, Taneja said he expects capex to be flat as it relates to AI.

This story originally published August 10, 2024, and we will update it as new information develops.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

特斯拉 Dojo超算 全自动驾驶 人工智能 Nvidia
相关文章