MarkTechPost@AI 04月11日 03:30
Google AI Introduces Ironwood: A Google TPU Purpose-Built for the Age of Inference
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

在 2025 年 Google Cloud Next 大会上,谷歌推出了 Ironwood,这是其最新一代张量处理单元 (TPU),专为大规模 AI 推理工作负载而设计。此次发布标志着谷歌战略重心转移,旨在优化推理基础设施,反映了在部署 AI 模型而非训练模型方面日益增长的运营关注。Ironwood 在计算性能、内存容量和能源效率方面带来了显著提升,是 Google TPU 架构的第七代产品,它将有助于满足各行业对可扩展、响应迅速且经济高效的 AI 系统的需求。

🚀 Ironwood 是谷歌第七代 TPU 架构,专为 AI 推理工作负载设计。每个芯片峰值吞吐量达到 4,614 teraflops (TFLOPs),包含 192 GB 高带宽内存 (HBM),支持高达 7.4 terabits/秒 (Tbps) 的带宽。Ironwood 可配置 256 或 9,216 个芯片,大型集群提供高达 42.5 exaflops 的算力,使其成为业内最强大的 AI 加速器之一。

💡 Ironwood 专门为推理而设计,与前几代 TPU 兼顾训练和推理工作负载不同。这反映了行业趋势,特别是在大型语言模型和生成模型中,推理正在成为生产环境中的主要工作负载。低延迟和高吞吐量性能至关重要,Ironwood 旨在高效满足这些需求。

⚙️ Ironwood 的关键架构改进在于增强的 SparseCore,它加速了排名和检索型工作负载中常见的稀疏运算。这种有针对性的优化减少了芯片上过度的数据移动需求,提高了特定推理密集型用例的延迟和功耗。

⚡ Ironwood 显著提高了能源效率,其性能功耗比是其前身的两倍以上。随着 AI 模型部署规模的扩大,能源使用变得越来越重要,这在经济和环境方面都带来了影响。Ironwood 的改进有助于应对大规模云基础设施中的这些挑战。

☁️ TPU 集成到谷歌更广泛的 AI 超级计算机框架中,这是一个结合了高速网络、定制芯片和分布式存储的模块化计算平台。这种集成简化了资源密集型模型的部署,使开发人员无需大量配置或调整即可提供实时 AI 应用。

At the 2025 Google Cloud Next event, Google introduced Ironwood, its latest generation of Tensor Processing Units (TPUs), designed specifically for large-scale AI inference workloads. This release marks a strategic shift toward optimizing infrastructure for inference, reflecting the increasing operational focus on deploying AI models rather than training them.

Ironwood is the seventh generation in Google’s TPU architecture and brings substantial improvements in compute performance, memory capacity, and energy efficiency. Each chip delivers a peak throughput of 4,614 teraflops (TFLOPs) and includes 192 GB of high-bandwidth memory (HBM), supporting bandwidths up to 7.4 terabits per second (Tbps). Ironwood can be deployed in configurations of 256 or 9,216 chips, with the larger cluster offering up to 42.5 exaflops of compute, making it one of the most powerful AI accelerators in the industry.

Unlike previous TPU generations that balanced training and inference workloads, Ironwood is engineered specifically for inference. This reflects a broader industry trend where inference, particularly for large language and generative models, is emerging as the dominant workload in production environments. Low-latency and high-throughput performance are critical in such scenarios, and Ironwood is designed to meet those demands efficiently.

A key architectural advancement in Ironwood is the enhanced SparseCore, which accelerates sparse operations commonly found in ranking and retrieval-based workloads. This targeted optimization reduces the need for excessive data movement across the chip and improves both latency and power consumption for specific inference-heavy use cases.

Ironwood also improves energy efficiency significantly, offering more than double the performance-per-watt compared to its predecessor. As AI model deployment scales, energy usage becomes an increasingly important constraint—both economically and environmentally. The improvements in Ironwood contribute toward addressing these challenges in large-scale cloud infrastructure.

The TPU is integrated into Google’s broader AI Hypercomputer framework, a modular compute platform combining high-speed networking, custom silicon, and distributed storage. This integration simplifies the deployment of resource-intensive models, enabling developers to serve real-time AI applications without extensive configuration or tuning.

This launch also signals Google’s intent to remain competitive in the AI infrastructure space, where companies such as Amazon and Microsoft are developing their own in-house AI accelerators. While industry leaders have traditionally relied on GPUs, particularly from Nvidia, the emergence of custom silicon solutions is reshaping the AI compute landscape.

Ironwood’s release reflects the growing maturity of AI infrastructure, where efficiency, reliability, and deployment readiness are now as important as raw compute power. By focusing on inference-first design, Google aims to meet the evolving needs of enterprises running foundation models in production—whether for search, content generation, recommendation systems, or interactive applications.

In summary, Ironwood represents a targeted evolution in TPU design. It prioritizes the needs of inference-heavy workloads with enhanced compute capabilities, improved efficiency, and tighter integration with Google Cloud’s infrastructure. As AI transitions into an operational phase across industries, hardware purpose-built for inference will become increasingly central to scalable, responsive, and cost-effective AI systems.

.


Check out the Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

The post Google AI Introduces Ironwood: A Google TPU Purpose-Built for the Age of Inference appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google Ironwood TPU AI推理
相关文章