AWS Machine Learning Blog 2024年12月03日
Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AWS 与 NVIDIA 合作,在 Amazon SageMaker 上推出新功能,加速 AI 推理工作负载。主要包括:在 AWS Marketplace for SageMaker 推出 NVIDIA NIM 微服务,提供多种生成式 AI 模型;在 Amazon SageMaker JumpStart 中提供 NVIDIA Nemotron-4 模型,这是一个强大的多语言推理模型;以及推出推理优化的 P5e 和 G6e 实例,提供 NVIDIA H200 Tensor Core 和 L40S GPU。这些新功能简化了生成式 AI 模型的部署,并提高了 AI 推理性能,帮助客户更快速有效地构建和部署 AI 应用。

🚀 **AWS Marketplace 提供 NVIDIA NIM 微服务:** AWS Marketplace 集成了 NVIDIA NIM 微服务,客户可以轻松访问并部署 NVIDIA 的生成式 AI 模型,如 Nemotron-4、Llama 3.1 和 Mixtral 等,简化了生成式 AI 模型的部署流程。

💡 **SageMaker JumpStart 集成 NVIDIA Nemotron-4 模型:** SageMaker JumpStart 集成了 NVIDIA Nemotron-4 模型,这是一个强大的多语言模型,在推理基准测试中表现出色,能够生成多种合成数据,增强自定义 LLM 的性能。客户可以通过 JumpStart 界面轻松部署和微调该模型。

💻 **推出推理优化的 P5e 和 G6e 实例:** AWS 推出了推理优化的 P5e 和 G6e 实例,分别搭载 NVIDIA H200 Tensor Core 和 L40S GPU,为 AI 推理工作负载提供更高的性能和效率,满足不同客户的需求。

This post is co-written with Abhishek Sawarkar, Eliuth Triana, Jiahong Liu and Kshitiz Gupta from NVIDIA. 

At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. These advancements build upon our collaboration with NVIDIA, which includes adding support for inference-optimized GPU instances and integration with NVIDIA technologies. They represent our continued commitment to delivering scalable, cost-effective, and flexible GPU-accelerated AI inference capabilities to our customers.

Today, we are introducing three key advancements that further expand our AI inference capabilities:

    NVIDIA NIM microservices are now available in AWS Marketplace for SageMaker Inference deployments, providing customers with easy access to state-of-the-art generative AI models. NVIDIA Nemotron-4 is now available on Amazon SageMaker JumpStart, significantly expanding the range of high-quality, pre-trained models available to our customers. This integration provides a powerful multilingual model that excels in reasoning benchmarks. Inference-optimized P5e and G6e instances are now generally available on Amazon SageMaker, giving customers access to NVIDIA H200 Tensor Core and L40S GPUs for AI inference workloads.

In this post, we will explore how you can use these new capabilities to enhance your AI inference on Amazon SageMaker. We’ll walk through the process of deploying NVIDIA NIM microservices from AWS Marketplace for SageMaker Inference. We’ll then dive into NVIDIA’s model offerings on SageMaker JumpStart, showcasing how to access and deploy the Nemotron-4 model directly in the JumpStart interface. This will include step-by-step instructions on how to find the Nemotron-4 model in the JumpStart catalog, select it for your use case, and deploy it with a few clicks. We’ll also demonstrate how to fine-tune and optimize this model for your specific requirements. Additionally, we’ll introduce you to the new inference-optimized P5e and G6e instances powered by NVIDIA H200 and L40S GPUs, showcasing how they can significantly boost your AI inference performance. By the end of this post, you’ll have a practical understanding of how to implement these advancements in your own AI projects, enabling you to accelerate your inference workloads and drive innovation in your organization.

Announcing NVIDIA NIM in AWS Marketplace for SageMaker Inference

NVIDIA NIM, part of the NVIDIA AI Enterprise software platform, offers a set of high-performance microservices designed to help organizations rapidly deploy and scale generative AI applications on NVIDIA-accelerated infrastructure. SageMaker Inference is a fully managed capability for customers to run generative AI and machine learning models at scale, providing purpose-built features and a broad array of inference-optimized instances. AWS Marketplace serves as a curated digital catalog where customers can find, buy, deploy, and manage third-party software, data, and services needed to build solutions and run businesses. We’re excited to announce that AWS customers can now access NVIDIA NIM microservices for SageMaker Inference deployments through the AWS Marketplace , simplifying the deployment of generative AI models and helping partners and enterprises to scale their AI capabilities. The initial availability includes a portfolio of models packaged as NIM microservices, expanding the options for AI inference on Amazon SageMaker, including:

Key benefits of deploying NIM on AWS

How to get started with NVIDIA NIM on AWS

To deploy NVIDIA NIM microservices from the AWS Marketplace, follow these steps:

    Visit the NVIDIA NIM page on the AWS Marketplace and select your desired model, such as Llama 3.1 or Mixtral. Choose the AWS Regions to deploy to, GPU instance types, and resource allocations to fit your needs. Use the notebook examples to start your deployment using SageMaker to create the model, configure the endpoint, and deploy the model, and AWS will handle the orchestration of resources, networking, and scaling as needed.

NVIDIA NIM microservices in the AWS Marketplace facilitates seamless deployment in SageMaker so that organizations across various industries can develop, deploy, and scale their generative AI applications more quickly and effectively than ever.

SageMaker JumpStart now includes NVIDIA models: Introducing NVIDIA NIM microservices for Nemotron models

SageMaker JumpStart is a model hub and no-code solution within SageMaker that makes advanced AI inference capabilities more accessible to AWS customers by providing a streamlined path to access and deploy popular models from different providers. It offers an intuitive interface where organizations can easily deploy popular AI models with a few clicks, eliminating the complexity typically associated with model deployment and infrastructure management. The integration offers enterprise-grade features including model evaluation metrics, fine-tuning and customization capabilities, and collaboration tools, all while giving customers full control of their deployment.

We are excited to announce that NVIDIA models are now available in SageMaker JumpStart, marking a significant milestone in our ongoing collaboration. This integration brings NVIDIA’s cutting-edge AI models directly to SageMaker Inference customers, starting with the powerful Nemotron-4 model. With JumpStart, customers can access their state-of-the-art models within the SageMaker ecosystem to combine NVIDIA’s AI models with the scalable and price performance inference from SageMaker.

Support for Nemotron-4 – A multilingual and fine-grained reasoning model

We are also excited to announce that NVIDIA Nemotron-4 is now available in JumpStart model hub. Nemotron-4 is a cutting-edge LLM designed to generate diverse synthetic data that closely mimics real-world data, enhancing the performance and robustness of custom LLMs across various domains. Compact yet powerful, it has been fine-tuned on carefully curated datasets that emphasize high-quality sources and underrepresented domains. This refined approach enables strong results in commonsense reasoning, mathematical problem-solving, and programming tasks. Moreover, Nemotron-4 exhibits outstanding multilingual capabilities compared to similarly sized models, and even outperforms those over four times larger and those explicitly specialized for multilingual tasks.

Nemotron-4 – performance and optimization benefits

Nemotron-4 demonstrates great performance in common sense reasoning tasks like SIQA, ARC, PIQA, and Hellaswag with an average score of 73.4, outperforming similarly sized models and demonstrating similar performance against larger ones such as Llama-2 34B. Its exceptional multilingual capabilities also surpass specialized models like mGPT 13B and XGLM 7.5B on benchmarks like XCOPA and TyDiQA, highlighting its versatility and efficiency. When deployed through NVIDIA NIM microservices on SageMaker, these models deliver optimized inference performance, allowing businesses to generate and validate synthetic data with unprecedented speed and accuracy.

Through SageMaker JumpStart, customers can access pre-optimized models from NVIDIA that significantly simplify deployment and management. These containers are specifically tuned for NVIDIA GPUs on AWS, providing optimal performance out of the box. NIM microservices deliver efficient deployment and scaling, allowing organizations to focus on their use cases rather than infrastructure management.

Quick start guide

    From SageMaker Studio console, select JumpStart and choose the NVIDIA model family as shown in the following image.
    Select the NVIDIA Nemotron-4 NIM microservice.
    On the model details page, choose Deploy, and a pop-up window will remind you that you need an AWS Marketplace subscription. If you haven’t subscribed to this model, you can choose Subscribe, which will direct you to the AWS Marketplace to complete the subscription. Otherwise, you can choose Deploy to proceed with model deployment.
    On the model deployment page, you can configure the endpoint name, select the endpoint instance type and instance count, in addition to other advanced settings, such as IAM role and VPC setting.
    After you finish setting up the endpoint and choose Deploy at the bottom right corner, the NVIDIA Nemotron-4 model will be deployed to a SageMaker endpoint. After the endpoint’s status is In Service, you can start testing the model by invoking the endpoint using the following code. Take a look at the example notebook if you want to deploy the model programmatically.
     messages = [ {"role": "user", "content": "Hello! How are you?"}, {"role": "assistant", "content": "Hi! I am quite well, how can I help you today?"}, {"role": "user", "content": "Write a short limerick about the wonders of GPU Computing."}]payload = { "model": payload_model, "messages": messages, "max_tokens": 100, "stream": True}response = client.invoke_endpoint_with_response_stream( EndpointName=endpoint_name, Body=json.dumps(payload), ContentType="application/json", Accept="application/jsonlines",)
    To clean up the endpoint, you can delete the endpoint from the SageMaker Studio console or call the delete endpoint API.
    sagemaker.delete_endpoint(EndpointName=<endpoint_name>)

SageMaker JumpStart provides an additional streamlined path to access and deploy NVIDIA NIM microservices, making advanced AI capabilities even more accessible to AWS customers. Through JumpStart’s intuitive interface, organizations can deploy Nemotron models with a few clicks, eliminating the complexity typically associated with model deployment and infrastructure management. The integration offers enterprise-grade features including model evaluation metrics, customization capabilities, and collaboration tools, all while maintaining data privacy within the customer’s VPC. This comprehensive integration enables organizations to accelerate their AI initiatives while using the combined strengths of the scalable infrastructure provided by AWS and NVIDIA’s optimized models.

P5e and G6e instances powered by NVIDIA H200 Tensor Core and L40S GPUs are now available on SageMaker Inference

SageMaker now supports new P5e and G6e instances, powered by NVIDIA GPUs for AI inference.

P5e instances use NVIDIA H200 Tensor Core GPUs for AI and machine learning. These instances offer 1.7 times larger GPU memory and 1.4 times higher memory bandwidth than previous generations. With eight powerful H200 GPUs per instance connected using NVIDIA NVLink for seamless GPU-to-GPU communication and blazing-fast 3,200 Gbps multi-node networking through EFA technology, P5e instances are purpose-built for deploying and training even the most demanding ML models. These instances deliver performance, reliability, and scalability for your cutting-edge inference applications.

G6e instances, powered by NVIDIA L40S GPUs, are one of the most cost-efficient GPU instances for deploying generative AI models and the highest-performance universal GPU instances for spatial computing, AI, and graphics workloads. They offer 2 times higher GPU memory (48 GB) and 2.9 times faster GPU memory bandwidth compared to G6 instances. G6e instances deliver up to 2.5 times better performance compared to G5 instances. Customers can use G6e instances to deploy LLMs and diffusion models for generating images, video, and audio. G6e instances feature up to eight NVIDIA L40S GPUs with 384 GB of total GPU memory (48 GB of memory per GPU) and third-generation AMD EPYC processors. They also support up to 192 vCPUs, up to 400 Gbps of network bandwidth, up to 1.536 TB of system memory, and up to 7.6 TB of local NVMe SSD storage.

Both instances’ families are now available on SageMaker Inference. Checkout AWS Region availability and pricing on our pricing page.

Conclusion

These new capabilities let you deploy NVIDIA NIM microservices on SageMaker through the AWS Marketplace, use new NVIDIA Nemotron models, and tap the latest GPU instance types to power your ML workloads. We encourage you to give these offerings a look and use them to accelerate your AI workloads on SageMaker Inference.


About the authors

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends. You can find him on LinkedIn.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of Generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions leveraging state-of-the-art AI and machine learning tools. She has been actively involved in multiple Generative AI initiatives across APJ, harnessing the power of Large Language Models (LLMs). Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Eliuth Triana is a Developer Relations Manager at NVIDIA empowering Amazon’s AI MLOps, DevOps, Scientists and AWS technical experts to master the NVIDIA computing stack for accelerating and optimizing Generative AI Foundation models spanning from data curation, GPU training, model inference and production deployment on AWS GPU instances. In addition, Eliuth is a passionate mountain biker, skier, tennis and poker player.

Abhishek Sawarkar is a product manager in the NVIDIA AI Enterprise team working on integrating NVIDIA AI Software in Cloud MLOps platforms. He focuses on integrating the NVIDIA AI end-to-end stack within Cloud platforms & enhancing user experience on accelerated computing.

Jiahong Liu is a Solutions Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA-accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Kshitiz Gupta is a Solutions Architect at NVIDIA. He enjoys educating cloud customers about the GPU AI technologies NVIDIA has to offer and assisting them with accelerating their machine learning and deep learning applications. Outside of work, he enjoys running, hiking, and wildlife watching.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AWS SageMaker NVIDIA AI 推理 生成式 AI Nemotron-4
相关文章