EnterpriseAI 2024年11月22日
DeltaAI Unveiled: How NCSA is Meeting the Demand for Next-Gen AI Research
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

伊利诺伊大学厄巴纳-香槟分校国家超级计算应用中心(NCSA)推出了备受期待的DeltaAI系统,该系统是一个先进的AI计算和数据资源,旨在加速复杂的人工智能、机器学习和高性能计算应用。DeltaAI由国家科学基金会资助,配备了Nvidia H100 Hopper GPU和GH200 Grace Hopper超级芯片等先进硬件,提供高达633 petaflops的半精度计算性能。该系统将通过NSF ACCESS计划和国家人工智能研究资源(NAIRR)试点项目,为全国各地的研究人员提供服务,促进人工智能研究,特别是可解释AI的发展,并推动传统高性能计算应用的进步。DeltaAI旨在平衡AI的炒作与实际的科学进步,为解决实际问题和改善人类生活提供支持。

🤔 **DeltaAI系统是NCSA基于Delta超级计算机的配套系统,旨在满足日益增长的GPU需求,尤其是在人工智能领域。** DeltaAI由国家科学基金会资助,旨在加速复杂的人工智能、机器学习和高性能计算应用,为全国各地的研究人员提供服务。

🖥️ **DeltaAI配备了320个NVIDIA Grace Hopper GPU,每个GPU配备96GB内存,提供高达633 petaflops的半精度计算性能,优化机器学习和AI工作负载。** 此外,该系统还拥有14 PB的存储空间和高可扩展的互连结构,能够处理海量数据和复杂计算任务。

💡 **DeltaAI将重点支持可解释AI的研究,包括理解大型语言模型的训练和推理过程。** 通过提供更大的GPU内存,DeltaAI能够处理更大的模型和更多数据,帮助研究人员深入探索AI系统的机制,解决AI模型的“黑箱”问题,提高AI系统的可信度和可靠性。

🌍 **DeltaAI将通过NSF ACCESS计划和国家人工智能研究资源(NAIRR)试点项目,为全国各地的研究人员提供服务。** 这将促进人工智能研究的合作与共享,推动AI技术在更多领域的应用。

🤝 **DeltaAI的设计兼顾AI和传统高性能计算应用,为两种类型的研究提供了一个通用的平台。** 其先进的架构,特别是多GPU节点和统一内存,能够有效解决高性能计算中的内存带宽限制问题,提高计算效率。

The National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign has just launched its highly anticipated DeltaAI system.

DeltaAI is an advanced AI computing and data resource that will be a companion system to NCSA’s Delta, a 338-node, HPE Cray-based supercomputer installed in 2021. The new DeltaAI has been funded by the National Science Foundation with nearly $30 million in awards and will be accessible to researchers across the country through the NSF ACCESS program and the National Artificial Intelligence Research Resource (NAIRR) pilot. 

The system will accelerate complex AI, machine learning, and HPC applications running terabytes of data by using advanced AI hardware, including the Nvidia H100 Hopper GPUs and GH200 Grace Hopper Superchips. 

This week, HPCwire caught up with NCSA Director Bill Gropp at SC24 in Atlanta to get the inside story on the new DeltaAI system which became fully operational last Friday.

From Delta to DeltaAI: Meeting the Growing Demand for GPUs

Gropp says DeltaAI was inspired by the increasing demand NCSA saw for GPUs while conceiving and deploying the original Delta system. 

“The name Delta comes from the fact that we saw these advances in the computing architecture, particularly in GPUs and other interfaces. And some of the community had been adopting these, but not all of the community, and we really feel that that’s an important direction for people to take,” Gropp told HPCwire. 

“So, we proposed Delta to NSF and got that funded. I think it was the first, essentially, almost-all-GPU resource since Keeneland, which was a long, long time ago, and we had expected it to be a mix of modeling simulation, like molecular dynamics, fluid flows, and AI. But as we deployed [Delta], AI just took off, and there was more and more demand.” 

The original Delta system with its Nvidia A100 GPUs and more modest amounts of GPU memory was state of the art for its time, Gropp says, but after the emergence and proliferation of large language models and other forms of generative AI, the game changed. 

“We looked at what people needed, and we realized that there was enormous demand for GPU resources for AI research and that more GPU memory is going to be needed for these larger models,” he said. 

The original Delta system at NCSA, the companion system to the new DeltaAI. (Source: NCSA)

Scaling GPU Power to Demystify AI 

The new DeltaAI system will provide approximately twice the performance of the original Delta, offering petaflops of double-precision (FP64) performance for tasks requiring high numerical accuracy, such as fluid dynamics or climate modeling, and a staggering 633 petaflops of half-precision (FP16) performance, optimized for machine learning and AI workloads.  

This extraordinary compute capability is driven by 320 NVIDIA Grace Hopper GPUs, each equipped with 96GB of memory—resulting in a total of 384GB of GPU memory per node. The nodes are further supported by 14 PB of storage at up to 1TB/sec and are interconnected with a highly scalable fabric. 

Gropp says supplemental NSF funding for Delta and DeltaAI will allow them to deploy additional nodes with more than a terabyte of GPU memory per node which will support AI research, particularly studies dedicated to understanding training and inference with LLMs. Gropp hopes this aspect of DeltaAI’s research potential will be a boon for explainable AI, as these as these massive memory resources enable researchers to handle larger models, process more data simultaneously, and conduct deeper explorations into the mechanics of AI systems. 

“There’s a tremendous amount of research we have done in explainable AI, trustworthy AI, and understanding how inference works,” Gropp explains, emphasizing key questions driving this work: “Why do the models work this way? How can you improve their quality and reliability?” 

Understanding how AI models arrive at specific conclusions is crucial for identifying biases to ensure fairness and increase accuracy, especially in high-stakes applications like healthcare and finance. Explainable AI has emerged as a response to “black box” AI systems and models that are not easily understood or accessible and often lack transparency in how they process inputs to generate outputs. 

As AI adoption accelerates, the demand for explainability and accuracy grows in parallel, prompting questions like “How can you reduce what is essentially interpolation error in these models so that people can depend on what they’re getting out of it?” Gropp said. “Seeing that demand is why we proposed this. I think that’s why NSF funded it, and it’s why we’re so excited.” 

Democratizing AI … and HPC? 

DeltaAI will be made available to researchers nationwide through the NSF ACCESS program and the National Artificial Intelligence Research Resource (NAIRR) pilot initiative. This broad accessibility is designed to foster collaboration and extend the reach of DeltaAI’s advanced compute capabilities. 

“We are really looking forward to seeing more and more users taking advantage of our state-of-the-art GPUs, as well as taking advantage of the kind of support that we can offer, and the ability to work with other groups and share our resources,” Gropp said. 

Gropp says the new system will serve a dual role in advancing both AI and more conventional computational science. While DeltaAI’s nodes are optimized for AI-specific workloads and tools, they are equally accessible to HPC users, as the system’s design makes it a versatile platform that serves both AI research and traditional HPC applications. 

HPC workloads like molecular dynamics, fluid mechanics, and structural mechanics, will benefit significantly from the system’s advanced architecture, particularly its multi-GPU nodes and unified memory. These features address common challenges in HPC, like memory bandwidth limitations, by offering tremendous bandwidth that enhances performance for computationally intensive tasks. 

Balancing AI Hype with Practical Scientific Progress 

DeltaAI is integrated with the original Delta system on the same Slingshot network and shared file system, representing a forward-thinking approach to infrastructure design. This interconnected setup not only maximizes resource efficiency but also lays the groundwork for future scalability. 

Gropp says that plans are already in place to add new systems over the next year or two, reflecting a shift toward a continuous upgrade model rather than waiting for current hardware to reach obsolescence. While this approach may introduce challenges in managing a more heterogeneous system, the benefits of staying at the forefront of innovation far outweigh the complexities. 

This innovative approach to infrastructure design ensures that traditional computing workloads are maintained and seamlessly integrated alongside AI advancements, fostering a balanced and versatile research environment amid the AI-saturated landscape of modern computing that can lead to AI fatigue. 

“The hype surrounding AI can be exhausting,” Gropp notes. “We do have to be careful because there is tremendous value in what AI can do. But there are a lot of things that it can’t do, and I think it will never be able to do, at least with the technologies we have.” 

DeltaAI exemplifies NCSA’s commitment to advancing both the frontiers of scientific understanding and the practical application of AI and HPC technologies. Scientific applications such as turbulence modeling are benefiting from combining HPC and AI. 

“I think that’s an exciting example of what we really want to be able to do. Not only do we want to understand it and satisfy our curiosity about it, but we’d like to be able to take that knowledge and use that to just make life better for humanity. Being able to do that translation is important,” Gropp said.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeltaAI 超级计算 人工智能 GPU 可解释AI
相关文章