AWS Machine Learning Blog 05月31日 01:03
Architect a mature generative AI foundation on AWS
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了企业构建生成式AI应用面临的挑战,如孤立的AI项目、重复流程和不一致的治理框架。为解决这些问题,文章提出采用统一的方法,将基础构建模块作为服务提供给各个业务部门,构建一个集中治理和运营的生成式AI平台。该平台提供核心服务、可重用组件和蓝图,同时应用标准化的安全和治理策略,从而简化开发、扩展AI能力、降低风险、优化成本并加速创新。

💡**模型中心和工具/代理中心**: 生成式AI基础平台的核心在于提供模型中心,集中管理企业内部使用的各种预训练或定制模型,并确保这些模型经过安全和法律审查。同时,平台还应提供工具/代理中心,方便用户发现和连接各种工具和代理。

🔑**网关**: 模型网关提供对模型中心的安全访问,通过标准化的API实现。它具备访问控制、统一API、速率限制、成本分摊、扩展和负载均衡、安全防护和缓存等关键特性,确保不同团队和业务部门之间的隔离,并有效管理API请求,优化资源使用。

🔄**编排**: 编排服务封装了生成式AI工作流,这些工作流通常包含多个步骤,如模型调用、数据源集成、工具使用和API调用。工作流可以是确定性的(如RAG模式),也可以是基于代理的,通过LLM进行规划和推理。平台应提供模型、向量数据库和安全防护等基础服务,以及用于定义AI工作流、代理和多代理的高级服务。

⚙️**模型定制**: 平台应提供模型定制能力,包括持续预训练、微调和对齐等技术。为了支持这些技术,平台需要提供可扩展的基础设施,用于数据存储和训练,以及用于编排调整和训练管道、注册和管理模型,以及托管模型。

📊**数据管理**: 平台应提供数据管理能力,包括与企业内外部数据源的集成,用于RAG或模型定制;预构建的RAG模板和蓝图,包含向量数据库选择、数据分块、嵌入和索引;以及用于模型定制的数据处理管道,包括创建标注和合成数据集的工具。

Generative AI applications seem simple—invoke a foundation model (FM) with the right context to generate a response. In reality, it’s a much more complex system involving workflows that invoke FMs, tools, and APIs and that use domain-specific data to ground responses with patterns such as Retrieval Augmented Generation (RAG) and workflows involving agents. Safety controls need to be applied to input and output to prevent harmful content, and foundational elements have to be established such as monitoring, automation, and continuous integration and delivery (CI/CD), which are needed to operationalize these systems in production.

Many organizations have siloed generative AI initiatives, with development managed independently by various departments and lines of businesses (LOBs). This often results in fragmented efforts, redundant processes, and the emergence of inconsistent governance frameworks and policies. Inefficiencies in resource allocation and utilization drive up costs.

To address these challenges, organizations are increasingly adopting a unified approach to build applications where foundational building blocks are offered as services to LOBs and teams for developing generative AI applications. This approach facilitates centralized governance and operations. Some organizations use the term “generative AI platform” to describe this approach. This can be adapted to different operating models of an organization: centralized, decentralized, and federated. A generative AI foundation offers core services, reusable components, and blueprints, while applying standardized security and governance policies.

This approach gives organizations many key benefits, such as streamlined development, the ability to scale generative AI development and operations across organization, mitigated risk as central management simplifies implementation of governance frameworks, optimized costs because of reuse, and accelerated innovation as teams can quickly build and ship use cases.

In this post, we give an overview of a well-established generative AI foundation, dive into its components, and present an end-to-end perspective. We look at different operating models and explore how such a foundation can operate within those boundaries. Lastly, we present a maturity model that helps enterprises assess their evolution path.

Overview

Laying out a strong generative AI foundation includes offering a comprehensive set of components to support the end-to-end generative AI application lifecycle. The following diagram illustrates these components.

In this section, we discuss the key components in more detail.

Hub

At the core of the foundation are multiple hubs that include:

Gateway

A model gateway offers secure access to the model hub through standardized APIs. Gateway is built as a multi-tenant component to provide isolation across teams and business units that are onboarded. Key features of a gateway include:

The AWS Solutions Library offers solution guidance to set up a multi-provider generative AI gateway. The solution uses an open source LiteLLM proxy wrapped in a container that can be deployed on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). This offers organizations a building block to develop an enterprise wide model hub and gateway. The generative AI foundation can start with the gateway and offer additional features as it matures.

The gateway pattern for tool/agent hub are still evolving. The model gateway can be a universal gateway to all the hubs or alternatively individual hubs could have their own purpose-built gateways.

Orchestration

Orchestration encapsulates generative AI workflows, which are usually a multi-step process. The steps could involve invocation of models, integrating data sources, using tools, or calling APIs. Workflows can be deterministic, where they are created as predefined templates. An example of a deterministic flow is a RAG pattern. In this pattern, a search engine is used to retrieve relevant sources and augment the data into the prompt context, before the model attempts to generate the response for the user prompt. This aims to reduce hallucination and encourage the generation of responses grounded in verified content.

Alternatively, complex workflows can be designed using agents where a large language model (LLM) decides the flow by planning and reasoning. During reasoning, the agent can decide when to continue thinking, call external tools (such as APIs or search engines), or submit its final response. Multi-agent orchestration is used to tackle even more complex problem domains by defining multiple specialized subagents, which can interact with each other to decompose and complete a task requiring different knowledge or skills. A generative AI foundation can provide primitives such as models, vector databases, and guardrails as a service and higher-level services for defining AI workflows, agents and multi-agents, tools, and also a catalog to encourage reuse.

Model customization

A key foundational capability that can be offered is model customization, including the following techniques:

For the preceding techniques, the foundation should provide scalable infrastructure for data storage and training, a mechanism to orchestrate tuning and training pipelines, a model registry to centrally register and govern the model, and infrastructure to host the model.

Data management

Organizations typically have multiple data sources, and data from these sources is mostly aggregated in data lakes and data warehouses. Common datasets can be made available as a foundational offering to different teams. The following are additional foundational components that can be offered:

GenAIOps

Generative AI operations (GenAIOps) encompasses overarching practices of managing and automating operations of generative AI systems. The following diagram illustrates its components.

Fundamentally, GenAIOps falls into two broad categories:

In addition, operationalization involves implementing CI/CD processes for automating deployments, integrating evaluation and prompt management systems, and collecting logs, traces, and metrics to optimize operations.

Observability

Observability for generative AI needs to account for the probabilistic nature of these systems—models might hallucinate, responses can be subjective, and troubleshooting is harder. Like other software systems, logs, metrics, and traces should be collected and centrally aggregated. There should be tools to generate insights out of this data that can be used to optimize the applications even further. In addition to component-level monitoring, as generative AI applications mature, deeper observability should be implemented, such as instrumenting traces, collecting real-world feedback, and looping it back to improve models and systems. Evaluation should be offered as a core foundational component, and this includes automated and human evaluation and LLM-as-a-judge pipelines along with storage of ground truth data.

Responsible AI

To balance the benefits of generative AI with the challenges that arise from it, it’s important to incorporate tools, techniques, and mechanisms that align to a broad set of responsible AI dimensions. At AWS, these Responsible AI dimensions include privacy and security, safety, transparency, explainability, veracity and robustness, fairness, controllability, and governance. Each organization would have its own governing set of responsible AI dimensions that can be centrally incorporated as best practices through the generative AI foundation.

Security and privacy

Communication should be over TLS, and private network access should be supported. User access should be secure, and a system should support fine-grained access control. Rate limiting and throttling should be in place to help prevent abuse. For data security, data should be encrypted at rest and transit, and tenant data isolation patterns should be implemented. Embeddings stored in vector stores should be encrypted. For model security, custom model weights should be encrypted and isolated for different tenants. Guardrails should be applied to input and output to filter topics and harmful content. Telemetry should be collected for actions that users take on the central system. Data quality is ownership of the consuming applications or data producers. The consuming applications should integrate observability into applications.

Governance

The two key areas of governance are model and data:

Tools landscape

A variety of AWS services, AWS partner solutions, and third-party tools and frameworks are available to architect a comprehensive generative AI foundation. The following figure might not cover the entire gamut of tools, but we have created a landscape based on our experience with these tools.

Operational boundaries

One of the challenges to solve for is who owns the foundational components and how do they operate within the organization’s operating model. Let’s look at three common operating models:

Multi-tenant architecture

Irrespective of the operating model, it’s important to define how tenants are isolated and managed within the system. The multi-tenant pattern depends on a number of factors:

Let’s break this down by taking a RAG application as an example. In the hybrid model, the tenant deployment contains instances of a vector database that stores the embeddings, which supports strict data isolation requirements. The deployment will additionally include the application layer that contains the frontend code and orchestration logic to take the user query, augment the prompt with context from the vector database, and invoke FMs on the central system. The foundational components that offer services such as evaluation and guardrails for applications to consume to build a production-ready application are in a separate shared deployment. Logs, metrics, and traces from the applications can be fed into a central aggregation place.

Generative AI foundation maturity model

We have defined a maturity model to track the evolution of the generative AI foundation across different stages of adoption. The maturity model can be used to assess where you are in the development journey and plan for expansion. We define the curve along four stages of adoption: emerging, advanced, mature, and established.

The details for each stage are as follows:

The evolution might not be exactly linear along the curve in terms of specific capabilities, but certain key performance indicators can be used to evaluate the adoption and growth.

Conclusion

Establishing a comprehensive generative AI foundation can be a critical step in harnessing the power of AI at scale. Enterprise AI development brings unique challenges ranging from agility, reliability, governance, scale, and collaboration. Therefore, a well-constructed foundation with the right components and adapted to the operating model of business aids in building and scaling generative AI applications across the enterprise.

The rapidly evolving generative AI landscape means there might be cutting-edge tools we haven’t covered under the tools landscape. If you’re using or aware of state-of-the art solutions that align with the foundational components, we encourage you to share them in the comments section.

Our team is dedicated to helping customers solve challenges in generative AI development at scale—whether it’s architecting a generative AI foundation, setting up operational best practices, or implementing responsible AI practices. Leave us a comment and we will be glad to collaborate.


About the authors

Chaitra Mathur is as a GenAI Specialist Solutions Architect at AWS. She works with customers across industries in building scalable generative AI platforms and operationalizing them. Throughout her career, she has shared her expertise at numerous conferences and has authored several blogs in the Machine Learning and Generative AI domains.

Dr. Alessandro Cerè is a GenAI Evaluation Specialist and Solutions Architect at AWS. He assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations. Bringing a unique perspective to the field of AI, Alessandro has a background in quantum physics and research experience in quantum communications and quantum memories. In his spare time, he pursues his passion for landscape and underwater photography.

Aamna Najmi is a GenAI and Data Specialist at AWS. She assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations, bringing a unique perspective of modern data strategies to complement the field of AI. In her spare time, she pursues her passion of experimenting with food and discovering new places.

Dr. Andrew Kane is the WW Tech Leader for Security and Compliance for AWS Generative AI Services, leading the delivery of under-the-hood technical assets for customers around security, as well as working with CISOs around the adoption of generative AI services within their organizations. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors. He was the legal licensee in his ancient (AD 1468) English countryside village pub until early 2020.

Bharathi Srinivasan is a Generative AI Data Scientist at the AWS Worldwide Specialist Organization. She works on developing solutions for Responsible AI, focusing on algorithmic fairness, veracity of large language models, and explainability. Bharathi guides internal teams and AWS customers on their responsible AI journey. She has presented her work at various learning conferences.

Denis V. Batalov is a 17-year Amazon veteran and a PhD in Machine Learning, Denis worked on such exciting projects as Search Inside the Book, Amazon Mobile apps and Kindle Direct Publishing. Since 2013 he has helped AWS customers adopt AI/ML technology as a Solutions Architect. Currently, Denis is a Worldwide Tech Leader for AI/ML responsible for the functioning of AWS ML Specialist Solutions Architects globally. Denis is a frequent public speaker, you can follow him on Twitter @dbatalov.

Nick McCarthy is a Generative AI Specialist at AWS. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Outside of work he loves to spend time traveling, trying new cuisines and reading about science and technology. Nick has a Bachelors degree in Astrophysics and a Masters degree in Machine Learning.

Alex Thewsey is a Generative AI Specialist Solutions Architect at AWS, based in Singapore. Alex helps customers across Southeast Asia to design and implement solutions with ML and Generative AI. He also enjoys karting, working with open source projects, and trying to keep up with new ML research.

Willie Lee is a Senior Tech PM for the AWS worldwide specialists team focusing on GenAI. He is passionate about machine learning and the many ways it can impact our lives, especially in the area of language comprehension.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

生成式AI AI平台 企业应用
相关文章