TechCrunch News 2024年12月03日
AWS bets on liquid cooling for its AI servers
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

亚马逊AWS在re:Invent大会前宣布了一系列数据中心策略更新,其中最重要的是将开始在其AI服务器和其他机器上使用液冷技术,无论这些服务器是基于自研的Trainium芯片还是英伟达的加速器。此外,AWS还简化了服务器和机架的电气和机械设计,以提高可用性和降低能耗。这些更新旨在提高数据中心的能源效率和性能,并为新兴的AI工作负载提供更灵活的支持,例如生成式AI应用。AWS预计未来几年内,数据中心机架功率密度将大幅提升,并通过AI优化资源配置,降低碳排放。

💧AWS将开始在其AI服务器和机器上使用液冷技术,包括Trainium2芯片和NVIDIA GB200 NVL72等,以提高性能和效率,并支持传统工作负载和AI模型。

💡AWS简化了服务器和机架的电气和机械设计,包括简化电力分配和机械系统,从而提高了基础设施的可用性(99.9999%),并减少了受电气问题影响的机架数量(减少89%)。

⚡️AWS通过减少电力转换次数来提高能源效率,例如使用直流电运行服务器和/或HVAC系统,避免了不必要的交流-直流-交流转换步骤。

📊AWS使用AI预测最有效的机架位置,以减少未利用或利用不足的电力,并推出自己的控制系统,提供实时诊断和故障排除功能。

🌍AWS数据中心策略更新旨在支持生成式AI应用,提高能源效率,并降低碳排放,未来几年内机架功率密度将大幅提升。

It’s AWS re:Invent this week, Amazon’s annual cloud computing extravaganza in Las Vegas, and as is tradition, the company has so much to announce, it can’t fit everything into its five (!) keynotes. Ahead of the show’s official opening, AWS on Monday detailed a number of updates to its overall data center strategy that are worth paying attention to.

The most important of these is that AWS will soon start using liquid cooling for its AI servers and other machines, regardless of whether those are based on its homegrown Trainium chips and Nvidia’s accelerators. Specifically, AWS notes that its Trainium2 chips (which are still in preview) and “rack-scale AI supercomputing solutions like NVIDIA GB200 NVL72” will be cooled this way.

It’s worth highlighting that AWS stresses that these updated cooling systems can integrate both air and liquid cooling. After all, there are still plenty of other servers in the data centers that handle networking and storage, for example, that don’t require liquid cooling. “This flexible, multimodal cooling design allows AWS to provide maximum performance and efficiency at the lowest cost, whether running traditional workloads or AI models,” AWS explains.

The company also announced that it is moving to more simplified electrical and mechanical designes for its servers and server racks.

“AWS’s latest data center design improvements include simplified electrical distribution and mechanical systems, which enable infrastructure availability of 99.9999%. The simplified systems also reduce the potential number of racks that can be impacted by electrical issues by 89%,” the company notes in its announcement. In part, AWS is doing this by reducing the number of times the electricity gets converted on its way from the electrical network to the server.

AWS didn’t provide many more details than that, but this likely means using DC power to run the servers and/or HVAC system and avoiding many of the AC-DC-AC conversion steps (with their default losses) otherwise necessary.

“AWS continues to relentlessly innovate its infrastructure to build the most performant, resilient, secure, and sustainable cloud for customers worldwide,” said Prasad Kalyanaraman, vice president of Infrastructure Services at AWS, in Monday’s announcement. “These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads. But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”

In total, AWS says, the new multimodal cooling system and upgraded power delivery system will let the organization “support a 6x increase in rack power density over the next two years, and another 3x increase in the future.”

In this context, AWS also notes that it is now using AI to predict the most efficient way to position racks in the data center to reduce the amount of unused or underutilized power. AWS will also roll out its own control system across its electrical and mechanical devices in the data center, which will come with built-in telemetry services for real-time diagnostics and troubleshooting.

“Data centers must evolve to meet AI’s transformative demands,” said Ian Buck, vice president of hyperscale and HPC at Nvidia. “By enabling advanced liquid cooling solutions, AI infrastructure can be efficiently cooled while minimizing energy use. Our work with AWS on their liquid cooling rack design will allow customers to run demanding AI workloads with exceptional performance and efficiency.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AWS 数据中心 液冷 AI 能源效率
相关文章