Nvidia Developer 02月16日
NVIDIA Grace CPU Integrates with the Arm Software Ecosystem
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA Grace CPU通过提供更高能效的性能,正在改变数据中心的设计。Grace CPU专为数据中心规模而构建,旨在处理高要求的工作负载,同时消耗更少的电力。它具有72个高性能Arm Neoverse V2内核,并通过NVIDIA可扩展一致性结构(SCF)连接,确保CPU内核之间平滑的数据流。Grace CPU还使用高速LPDDR5X内存,提供高达500 GB/s的内存带宽,同时仅消耗传统DDR内存的五分之一的能量。NVIDIA Grace CPU基于Arm架构,并充分利用了NVIDIA的软件和工具,与现有的Arm生态系统无缝集成。

🚀 NVIDIA Grace CPU专为现代数据中心设计,拥有72个高性能Arm Neoverse V2内核,NVIDIA设计的高带宽SCF可最大化性能,并采用高带宽低功耗内存,在相同功耗下可提供高达传统x86 CPU两倍的性能。

🤝 NVIDIA Grace CPU基于标准的Arm SBSA设计,像其他CPU一样工作,并与广泛的Arm软件生态系统完全兼容。大多数开源软件已经支持Arm,因此也支持Grace CPU。在Grace CPU上完成的任何软件优化和移植也适用于Arm Neoverse软件生态系统的其余部分。

🧰 NVIDIA的HPC SDK和每个CUDA组件都具有Arm原生安装程序和容器。NVIDIA NIM微服务和NGC的NVIDIA容器生态系统提供针对Arm优化的深度学习、机器学习和HPC容器。NVIDIA NIM增强了推理性能,实现了大规模的高吞吐量和低延迟AI。

📚 NVIDIA还通过推出名为NVIDIA Performance Libraries (NVPL) 的全新高性能数学库套件来扩展其 Arm CPU 的软件生态系统。这些库实现了标准 API,使得它们在链接阶段可以轻松地从 x86 直接替换。

The NVIDIA Grace CPU is transforming data center design by offering a new level of power-efficient performance. Built specifically for data center scale, the Grace CPU is designed to handle demanding workloads while consuming less power.NVIDIA believes in the benefit of leveraging GPUs to accelerate every workload. However, not all workloads are accelerated. This is especially true for those workloads involving complex, branchy code such as graph analytics, commonly used in popular use cases like fraud detection, operational optimization, and social network analysis. As data centers face increasing power constraints, it’s crucial to accelerate as many workloads as possible and run the rest on the most efficient compute possible. The Grace CPU is optimized to handle both accelerated and CPU-only tasks, delivering up to 2x the performance at the same power as conventional CPUs.The Grace CPU features 72 high-performance and energy-efficient Arm Neoverse V2 cores, connected by the NVIDIA Scalable Coherency Fabric (SCF). This high-bandwidth fabric ensures smooth data flow between CPU cores, cache, memory, and system I/O, providing up to 3.2 TB/s of bisection bandwidth—double that of traditional CPUs. The Grace CPU also uses high-speed LPDDR5X memory with server-class reliability, delivering up to 500 GB/s of memory bandwidth while consuming just one-fifth the energy of traditional DDR memory. In this post, we wanted to share how the Grace CPU builds on the existing Arm ecosystem while taking advantage of the vast array of NVIDIA software and tools.Standard software infrastructureThe Grace CPU was designed to be a balanced general-purpose CPU and to work just like any other CPU. The workflow for getting software to run on the Grace CPU is the same workflow that you’d use on any x86 CPU. Standard Linux distros (Ubuntu, RHEL, SLES, and so on) and any multi-platform, open-source compiler (GCC, LLVM, and so on) all support the Grace CPU. The majority of open source software today already supports Arm, and thus is supported on  the Grace CPU. Similarly, any software optimizations and porting done on the Grace CPU also work on the rest of the Arm Neoverse software ecosystem. NVIDIA continues to work with developers and partners in the Arm ecosystem and is committed to ensure that open-source compilers, libraries, frameworks, tools, and applications fully leverage Arm Neoverse-based CPUs, like the Grace CPU.Many cloud-native and commercial ISV applications already provide optimized executables for Arm. The Arm Developer Hub provides a showcase of selected software packages for AI, cloud, data center, 5G, networking, and edge. This hub also provides guidance on how to migrate applications to Arm.  This ecosystem is enabled by Arm standards, such as the Arm Server Base System Architecture (SBSA) and the Base Boot Requirements (BBR) of the Arm SystemReady Certification Program. NVIDIA software supports the Arm ecosystemArm has invested for decades in the software ecosystem. You can innovate and know that the software not only just works but is optimized for Arm. The NVIDIA software ecosystem also takes advantage of decades of work in accelerated computing and has now been optimized for Arm:The NVIDIA HPC SDK and every CUDA component have Arm-native installers and containers. The NVIDIA container ecosystem of NVIDIA NIM microservices and NGC provides deep learning, machine learning, and HPC containers optimized for Arm. NVIDIA NIM enhances inference performance, enabling high-throughput and low-latency AI at scale. NVIDIA is also expanding its software ecosystem for Arm CPUs. NVIDIA previously launched a new suite of high performance math libraries for Arm CPUs called NVIDIA Performance Libraries (NVPL). These libraries implement standard APIs, making their adoption an easy drop-in replacement from x86 at the linking stage. Similarly, math libraries such as the Arm’s Performance Library (ArmPL) are also tuned to maximize the performance of the Grace CPU in addition to any other Arm CPU. For example, Arm has shared how ArmPL Sparse can be used in a similar fashion to x86. ArmPL has similar APIs to those of the x86 math libraries, which means that developing a wrapper may require nothing more than just a few API changes in the code. NVIDIA is an active participant in the open-source software communities like those for GCC and LLVM compilers. If you don’t want to wait for these regular releases and want to build code that performs optimally on the Grace CPU, the latest optimizations are also made available through the Clang distribution.Seamlessly moving your software to ArmThe Arm software ecosystem is large and growing, with hundreds of open source projects and commercial ISVs already supporting the Arm architecture. If your application is not yet supported, you may need to just recompile the source code. There are a variety of tools available to help you do so:For more information about application porting and optimization, see the NVIDIA Grace Performance Tuning Guide. It includes instructions for setting up and optimizing performance on the Grace CPU. It also provides high-level developer guidance on Arm SIMD programming, the Arm memory model, and other details. Use this guide to help you realize the best possible performance for your particular NVIDIA Grace system.Figure 1. Running software on the Grace CPU uses the same process to optimize as for any other CPUSummaryThe NVIDIA Grace CPU is designed for the modern data center with 72 high-performance Arm Neoverse V2 cores, an NVIDIA-designed high-bandwidth SCF to maximize performance and high-bandwidth low-power memory. It can deliver up to 2x the performance in the same power envelope as leading traditional x86 CPUs. The NVIDIA Grace CPU is a standards-based Arm SBSA design that works just like any other CPU and is fully compatible with the broad Arm software ecosystem.For more information about software and system setup, see NVIDIA Grace CPU.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA Grace CPU Arm Neoverse V2 数据中心 能效 软件生态
相关文章