AWS Machine Learning Blog 2024年09月13日
Enabling production-grade generative AI: New capabilities lower costs, streamline production, and boost security
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

随着生成式AI从概念验证(POC)阶段过渡到生产阶段,我们见证了企业和消费者与数据、信息以及彼此互动方式的巨大转变。在生成式AI故事的第一幕中,我们看到了前所未有的数据和计算能力,创造出展示生成式AI力量的模型。仅仅在去年,许多企业乃至更多个人都专注于学习和实验,POC的数量令人印象深刻。数千家来自不同行业的客户进行了从几十到数百次的实验,探索生成式AI应用程序的潜力及其影响。

🚀 **生成式AI从POC到生产的演变:** 生成式AI正在从概念验证阶段过渡到生产阶段,这标志着企业和消费者与数据、信息以及彼此互动方式的重大转变。随着生成式AI技术不断成熟,其应用范围也随之扩展,从实验性项目发展为实际生产应用,为企业带来实际价值。

💡 **提高效率和降低成本的关键:** 为了让生成式AI用例更具经济效益,需要在性能极高且具有成本效益的基础设施上运行训练和推理,而Amazon SageMaker正是为此而设计的。然而,基础模型的训练和推理也面临着挑战,包括操作负担、总体成本和性能滞后,导致用户体验不佳。

🌟 **Amazon EKS在Amazon SageMaker HyperPod中的应用:** 为了应对这些挑战,AWS推出了Amazon SageMaker HyperPod,并于近期宣布在Amazon SageMaker HyperPod上推出对Amazon EKS的支持。通过使用Amazon EKS,构建者可以使用熟悉的Kubernetes界面,同时消除为大规模生成式AI模型开发设置和优化这些集群的繁重工作。

🚀 **提高推理效率:** 即使有了生成式AI建模的最新进展,推理阶段仍然是一个重要的瓶颈。为了解决这个问题,AWS推出了Amazon SageMaker上的推理优化工具包,这是一个完全托管的解决方案,提供最新的模型优化技术,例如推测解码、编译和量化。

🛡️ **安全可靠的模型部署:** 在将生成式AI从POC过渡到生产的过程中,除了成本和性能之外,还必须关注其他问题,例如安全、可信赖和负责任的部署。为了确保生成式AI应用的安全和可靠,AWS推出了Amazon Bedrock Guardrails,提供可定制的安全措施,帮助过滤提示和模型响应,防止有害内容和个人身份信息(PII)出现。

🏈 **NFL和AWS的合作案例:** NFL与AWS的合作案例证明了生产思维如何为组织和世界各地的人们带来真正的价值。通过使用AWS AI工具和工程师,NFL将铲球分析提升到了一个新的水平,为球队、广播公司和球迷提供了更深入的洞察,了解橄榄球中最关键的技能之一——铲球。

🚀 **AWS的突破性进展:** NFL并非唯一一家利用AWS将重点从POC转移到生产的企业。像Evolutionary Scale这样的激动人心的初创公司正在简化生成式蛋白质和抗体的生成。Airtable正在帮助其客户更轻松地使用数据并构建应用程序。像Slack这样的组织正在将生成式AI嵌入到日常工作中。快速发展、成功的初创公司选择AWS来构建和加速其业务。

🚀 **推动下一波创新:** 优化成本、提高生产效率和确保安全是生成式AI从POC过渡到生产过程中面临的主要挑战。AWS正在通过在Amazon SageMaker、Amazon Bedrock等平台上添加创新的新功能来帮助解决这些问题。同时,AWS还通过让这些工具对所有人开放,从拥有ML团队的大型企业到刚起步的小型企业和个人开发者,降低了进入门槛。

📈 **生成式AI的未来:** 随着生成式AI继续快速发展,从一项迷人的技术转变为日常现实,它将改善体验、激发创新、增强竞争优势并创造巨大的新价值。

As generative AI moves from proofs of concept (POCs) to production, we’re seeing a massive shift in how businesses and consumers interact with data, information—and each other. In what we consider “Act 1” of the generative AI story, we saw previously unimaginable amounts of data and compute create models that showcase the power of generative AI. Just last year, many businesses, and even more individuals, were focused on learning and experimenting, and the sheer number of POCs was impressive. Thousands of customers, across diverse industries, conducted experiments anywhere from dozens to hundreds of experiments as they explored the potential of generative AI applications and the implications.

By early 2024, we are beginning to see the start of “Act 2,” in which many POCs are evolving into production, delivering significant business value. To learn more about Act 1 and Act 2, refer to Are we prepared for “Act 2” of gen AI?. The move to a production mindset focuses new attention on key challenges as companies build and evaluate models on specific tasks and search for the leanest, fastest, and most cost-effective options. Considering—and reducing—the investment required for production workloads means bringing new efficiency to the sometime complicated process of building, testing, and fine-tuning foundation models (FMs).

Delivering capabilities that increase efficiency and reduce costs

Offering multiple entry points to their generative AI journey is critical to delivering value to companies moving their generative AI applications into production. Our generative AI technology stack provides the services and capabilities necessary to build and scale generative AI applications—from Amazon Q (the most capable generative AI–powered assistant for accelerating software development) at the top layer to Amazon Bedrock (The easiest way to build and scale generative AI applications with foundation models) at the middle layer to Amazon SageMaker (purpose-built to help you build, train, and deploy FMs) at the foundational, bottom layer. While these layers provide different points of entry, the fundamental truth is that every generative AI journey starts at the foundational bottom layer.

Organizations that want to build their own models or want granular control are choosing Amazon Web Services (AWS) because we are helping customers use the cloud more efficiently and leverage more powerful, price-performant AWS capabilities such as petabyte-scale networking capability, hyperscale clustering, and the right tools to help you build. Our deep investment in this layer enhances the capabilities and efficiency of the services we provide at higher layers.

To make generative AI use cases economical, you need to run your training and inference on incredibly high-performing, cost-effective infrastructure that’s purpose-built for AI. Amazon SageMaker makes it easy to optimize at each step of the model lifecycle, whether you are building, training, or deploying. However, FM training and inference present challenges—including operational burden, overall cost, and performance lag that contributes to an overall subpar user experience. State-of-the-art generative AI models are averaging latencies in the order of seconds, and many of today’s massive models are too large to fit into a single instance.

In addition, the blistering pace of model optimization innovations leaves model builders with months of research to learn and implement these techniques, even before finalizing deployment configurations.

Introducing Amazon Elastic Kubernetes Service (Amazon EKS) in Amazon SageMaker HyperPod

Recognizing these challenges, AWS launched Amazon SageMaker HyperPod last year. Taking efficiency one step further, earlier this week, we announced the launch of Amazon EKS support on Amazon SageMaker HyperPod. Why? Because provisioning and managing the large GPU clusters needed for AI can pose a significant operational burden. And training runs that take weeks to complete are challenging, since a single failure can derail the entire process. Ensuring infrastructure stability and optimizing performance of distributed training workloads can also pose challenges.

Amazon SageMaker HyperPod provides a fully managed service that removes the operational burden and enables enterprises to accelerate FM development at an unprecedented scale. Now, support for Amazon EKS in Amazon SageMaker HyperPod makes it possible for builders to manage their SageMaker HyperPod clusters using Amazon EKS. Builders can use a familiar Kubernetes interface while eliminating the undifferentiated heavy lifting involved in setting up and optimizing these clusters for generative AI model development at scale. SageMaker HyperPod provides a highly resilient environment that automatically detects, diagnoses, and recovers from underlying infrastructure faults so that builders can train FMs for weeks or months at a time with minimal disruption.

Customer quote: Articul8 AI

“Amazon SageMaker HyperPod has helped us tremendously in managing and operating our computational resources more efficiently with minimum downtime. We were early adopters of the Slurm-based SageMaker HyperPod service and have benefitted from its ease-of-use and resiliency features, resulting in up to 35% productivity improvement and rapid scale up of our gen AI operations.

As a Kubernetes house, we are now thrilled to welcome the launch of Amazon EKS support for SageMaker HyperPod. This is a game changer for us because it integrates seamlessly with our existing training pipelines and makes it even easier for us to manage and operate our large-scale Kubernetes clusters. In addition, this also helps our end customers because we are now able to package and productize this capability into our gen AI platform, enabling our customers to run their own training and fine-tuning workloads in a more streamlined manner.”

– Arun Subramaniyan, Founder and CEO of Articul8 AI

Bringing new efficiency to inference

Even with the latest advancements in generative AI modeling, the inference phase remains a significant bottleneck. We believe that businesses creating customer or consumer-facing generative AI applications shouldn’t have to sacrifice performance for cost-efficiency. They should be able to get both. That’s why two months ago, we released the inference optimization toolkit on Amazon SageMaker, a fully managed solution that provides the latest model optimization techniques, such as speculative decoding, compilation, and quantization. Available across SageMaker, this toolkit offers a simple menu of the latest optimization techniques that can be used individually or together to create an “optimization recipe.” Thanks to easy access and implementation of these techniques, customers can achieve up to ~2x higher throughput while reducing costs by ~50% for generative AI inference.

Responsible model deployment that is safe and trustworthy

While cost and performance are critical issues, it’s important not to lose sight of other concerns that come to the forefront as we shift from POC to production. No matter what model you choose, it needs to be deployed in a safe, trustworthy, and responsible way. We all need to be able to unlock generative AI’s full potential while mitigating its risks. It should be easy to implement safeguards for your generative AI applications, customized to your requirements and responsible AI policies.

That’s why we built Amazon Bedrock Guardrails, a service that provides customizable safeguards so you can filter prompts and model responses. Guardrails can help block specific words or topics. As well, customers can use Guardrails to help identify and prevent restricted content from reaching end users.

We also have filters for harmful content and personal identifiable information (PII) and security checks for malicious prompts, such as prompt injections. Recently, we also developed guardrails to help reduce hallucinations by checking that responses are found in the source material and related to the query.

Delivering value with game-changing innovation

Our partnership with the NFL and our joint Next Gen Stats program offer impressive proof of how a production mindset is delivering true value not only to an organization but to people across the world. By using AWS AI tools and engineers, the NFL is taking tackle analysis to the next level, giving teams, broadcasters, and fans deeper insights into one of football’s most crucial skills—tackling. As fans know, tackling is a complex, evolving process that unfolds throughout each play. But traditional stats only tell part of the story. That’s why the NFL and AWS created Tackle Probability—a groundbreaking AI-powered metric that can identify a missed tackle, when and where that tackle attempt took place, and do it all in real time. For further detail, go to NFL on AWS.

Building this stat required 5 years of historical data to train an AI model on Amazon SageMaker capable of processing millions of data points per game, tracking 20 different features for each of the 11 defenders every tenth of a second. The result is a literally game-changing stat that provides unprecedented insights. Now the NFL can quantify tackling efficiency in ways never before possible. A defender can be credited with 15 tackle attempts in a game without a single miss, or we can measure how many missed tackles a running back forced. All told, there will be at least 10 new stats from this model.

For the NFL, coaches can now quantify tackling efficiency and identify players who consistently put themselves in the right position to make the play. And broadcasters can highlight broken or made tackles to fans in real time.

Building breakthroughs with AWS

The NFL is far from alone in making in using AWS to shift its focus from POC to production. Exciting startups like Evolutionary Scale are making it easy to generate new proteins and antibodies. Airtable is making it easier for their customers to use their data and build applications. And organizations like Slack are embedding generative AI into the workday. Fast-moving, successful start-ups are choosing AWS to build and accelerate their businesses. In fact, 96 percent of all AI/ML unicorns—and 90 percent of the 2024 Forbes AI 50—are AWS customers.

Why? Because we’re addressing the cost, performance, and security issues that enable production-grade generative AI applications. We’re empowering data scientists, ML engineers, and other builders with new capabilities that make generative AI development faster, easier, more secure, and less costly. We’re making FM building and tuning—and a portfolio of intuitive tools that make it happen—available to more organizations as part of our ongoing commitment to the democratization of generative AI.

Fueling the next wave of innovation

Optimizing costs, boosting production efficiency, and ensuring security—these are among the top challenges as generative AI evolves from POC production. We’re helping address these issues by adding innovative new capabilities to Amazon SageMaker, Amazon Bedrock, and beyond. And we’re lowering the barriers to entry by making these tools available to everyone, from large enterprises with ML teams to small businesses and individual developers just getting started. Empowering more people and organizations to experiment with generative AI creates an explosion of creative new use cases and applications. That’s exactly what we’re seeing as generative AI continues its rapid evolution from a fascinating technology to a day-to-day reality—improving experiences, inspiring innovation, boosting the competitive edge, and creating significant new value.


About the author

Baskar Sridharan is the Vice President for AI/ML and Data Services & Infrastructure, where he oversees the strategic direction and development of key services, including Bedrock, SageMaker, and essential data platforms like EMR, Athena, and Glue.

Prior to his current role, Baskar spent nearly six years at Google, where he contributed to advancements in cloud computing infrastructure. Before that, he dedicated 16 years to Microsoft, playing a pivotal role in the development of Azure Data Lake and Cosmos, which have significantly influenced the landscape of cloud storage and data management.

Baskar earned a Ph.D. in Computer Science from Purdue University and has since spent over two decades at the forefront of the tech industry.

He has lived in Seattle for over 20 years, where he, his wife, and two children embrace the beauty of the Pacific Northwest and its many outdoor activities. In his free time, Baskar enjoys practicing music and playing cricket and baseball with his kids.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI 概念验证 生产 AWS Amazon SageMaker Amazon Bedrock Amazon EKS 推理优化 安全模型部署 NFL 创新
相关文章