AWS Machine Learning Blog 2024年12月16日
Multi-tenant RAG with Amazon Bedrock Knowledge Bases
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了如何利用Amazon Bedrock Knowledge Bases实现检索增强生成(RAG)架构的多租户模式。针对独立软件供应商(ISV)在构建软件即服务(SaaS)产品时面临的挑战,如数据隔离、安全、租户管理和成本效率,提出了三种不同的多租户模式:Silo(孤岛)、Pool(池化)和Bridge(桥接)。Silo模式为每个租户提供完全独立的资源栈,隔离性最强但成本最高;Pool模式通过共享资源降低成本,但隔离性较弱;Bridge模式则介于两者之间,兼顾了隔离性和成本效率。文章深入分析了每种模式的优缺点,并提供了在Amazon Bedrock Knowledge Bases、S3和OpenSearch Service上实现这些模式的具体指导。

🗂️ **租户隔离**:多租户系统的核心在于如何隔离不同租户的数据和资源,包括数据源、处理管道、向量数据库和RAG客户端应用。隔离级别受安全性、性能、可扩展性和法规要求影响,可能需要为每个租户使用不同的加密密钥,并防止一个租户的活动影响其他租户。

⚙️ **租户差异**:不同租户对RAG系统的需求可能不同,例如数据摄取频率、文档分块策略或向量搜索配置等。因此,系统需要具备一定的灵活性,以满足不同租户的个性化需求。

🏢 **租户管理**:多租户解决方案需要简便的租户 onboarding 和 offboarding 机制,这可能涉及为租户配置或销毁特定的基础设施,以及管理租户的数据。

💰 **成本效率**:多租户解决方案的运营成本取决于租户隔离机制,因此,设计一个成本高效的架构至关重要。文章提出了Silo、Pool和Bridge三种模式来平衡隔离性、成本和管理复杂度。

Organizations are continuously seeking ways to use their proprietary knowledge and domain expertise to gain a competitive edge. With the advent of foundation models (FMs) and their remarkable natural language processing capabilities, a new opportunity has emerged to unlock the value of their data assets.

As organizations strive to deliver personalized experiences to customers using generative AI, it becomes paramount to specialize the behavior of FMs using their own—and their customers’—data. Retrieval Augmented Generation (RAG) has emerged as a simple yet effective approach to achieve a desired level of specialization.

Amazon Bedrock Knowledge Bases is a fully managed capability that simplifies the management of the entire RAG workflow, empowering organizations to give FMs and agents contextual information from company’s private data sources to deliver more relevant and accurate responses tailored to their specific needs.

For organizations developing multi-tenant products, such as independent software vendors (ISVs) creating software as a service (SaaS) offerings, the ability to personalize experiences for each of their customers (tenants in their SaaS application) is particularly significant. This personalization can be achieved by implementing a RAG approach that selectively uses tenant-specific data.

In this post, we discuss and provide examples of how to achieve personalization using Amazon Bedrock Knowledge Bases. We focus particularly on addressing the multi-tenancy challenges that ISVs face, including data isolation, security, tenant management, and cost management. We focus on scenarios where the RAG architecture is integrated into the ISV application and not directly exposed to tenants. Although the specific implementations presented in this post use Amazon OpenSearch Service as a vector database to store tenants’ data, the challenges and architecture solutions proposed can be extended and tailored to other vector store implementations.

Multi-Tenancy design considerations

When architecting a multi-tenanted RAG system, organizations need to take several considerations into account:

These four considerations need to be carefully balanced and weighted to suit the needs of the specific solution. In this post, we present a model to simplify the decision-making process. Using the core isolation concepts of silo, pool, and bridge defined in the SaaS Tenant Isolation Strategies whitepaper, we propose three patterns for implementing a multi-tenant RAG solution using Amazon Bedrock Knowledge Bases, Amazon Simple Storage Service (Amazon S3), and OpenSearch Service.

A typical RAG solution using Amazon Bedrock Knowledge Bases is composed of several components, as shown in the following figure:

The main challenge in adapting this architecture for multi-tenancy is determining how to provide isolation between tenants for each of the components. We propose three prescriptive patterns that cater to different use cases and offer carrying levels of isolation, variability, management simplicity, and cost-efficiency. The following figure illustrates the trade-offs between these three architectural patterns in terms of achieving tenant isolation, variability, cost-efficiency, and ease of tenant management.

Multi-tenancy patterns

In this section, we describe the implementation of these three different multi-tenancy patterns in a RAG architecture based on Amazon Bedrock Knowledge Bases, discussing their use cases as well as their pros and cons.

Silo

The silo pattern, illustrated in the following figure, offers the highest level of tenant isolation, because the entire stack is deployed and managed independently for each single tenant.

In the context of the RAG architecture implemented by Amazon Bedrock Knowledge Bases, this pattern prescribes the following:

Because the silo pattern offers tenant architectural independence, onboarding and offboarding a tenant means creating and destroying the RAG stack for that tenant, composed of the S3 bucket, knowledge base, and OpenSearch Serverless collection. You would typically do this using infrastructure as code (IaC). Depending on your application architecture, you may also need to update the log sinks and monitoring systems for each tenant.

Although the silo pattern offers the highest level of tenant isolation, it is also the most expensive to implement, mainly due to creating a separate OpenSearch Serverless collection per tenant for the following reasons:

When choosing the silo pattern, note that a maximum of 100 knowledge bases are supported in each AWS account. This makes the silo pattern favorable for your largest tenants with specific isolation requirements. Having a separate knowledge base per tenant also reduces the impact of quotas on concurrent ingestion jobs (maximum one concurrent job per KB, five per account), job size (100 GB per job), and data sources (maximum of 5 million documents per data source). It also improves the performance fairness as perceived by your tenants.
Deleting a knowledge base during offboarding a tenant might be time-consuming, depending on the size of the data sources and the synchronization process. To mitigate this, you can set the data deletion policy in your tenants’ knowledge bases to RETAIN. This way, the knowledge base deletion process will not delete your tenants’ data from the OpenSearch Service index. You can delete the index by deleting the OpenSearch Serverless collection.

Pool

In contrast with the silo pattern, in the pool pattern, illustrated in the following figure, the whole end-to-end RAG architecture is shared by your tenants, making it particularly suitable to accommodate many small tenants.

The pool pattern prescribes the following:

{  "metadataAttributes" : {    "tenantId" : "tenant_1",  ...  }}

In the preceding JSON structure, the key tenantId has been deliberately chosen, and can be changed to a key you want to use to express tenancy. The tenancy field will be used at runtime to filter documents belonging to a specific tenant, therefore the filtering key at runtime must match the metadata key in the JSON used to index the documents. Additionally, you can include other metadata keys to perform further filtering that isn’t based on tenancy. If you don’t upload the object.metadata.json file, the client application won’t be able to find the document using metadata filtering.

import boto3bedrock_agent_runtime = boto3.client(    service_name = "bedrock-agent-runtime")tenant_filter = {    "equals": {        "key": "tenantId",        "value": "tenant_1"    }}retrievalConfiguration = {    "vectorSearchConfiguration": {        "filter": tenant_filter    }}bedrock_agent_runtime.retrieve_and_generate(    input = {        'text': 'The original user query'    },    retrieveAndGenerateConfiguration = {        'type': 'KNOWLEDGE_BASE',        'knowledgeBaseConfiguration': {            'knowledgeBaseId': <YOUR_KNOWLEDGEBASE_ID>,            'modelArn': <FM_ARN>,            'retrievalConfiguration': retrievalConfiguration        }    })

text contains the original user query that needs to be answered. Taking into account the document base, <YOUR_KNOWLEDGEBASE_ID> needs to be substituted with the identifier of the knowledge base used to pool your tenants, and <FM_ARN> needs to be substituted with the Amazon Bedrock model Amazon Resource Name (ARN) you want to use to reply to the user query. The client presented in the preceding code has been streamlined to present the tenant filtering functionality. In a production case, we recommend implementing session and error handling, logging and retry logic, and separating the tenant filtering logic from the client invocation to make it inaccessible to developers.

Because the end-to-end architecture is pooled in this pattern, onboarding and offboarding a tenant doesn’t require you to create new physical or logical constructs, and it’s as simple as starting or stopping and uploading specific tenant documents to Amazon S3. This implies that there is no AWS managed API that can be used to offboard and end-to-end forget a specific tenant. To delete the historical documents belonging to a specific tenant, you can just delete the relevant objects in Amazon S3. Typically, customers will have an external application that maintains the list of available tenants and their status, facilitating the onboarding and offboarding process.

Sharing the monitoring system and logging capabilities in this pattern reduces the complexity of operations with a large number of tenants. However, it requires you to collect the tenant-specific metrics from the client side to perform specific tenant attribution.

The pool pattern optimizes the end-to-end cost of your RAG architecture, because sharing OCUs across tenants maximizes the use of each OCU and minimizes the tenants’ idle time. Sharing the same pool of OCUs across tenants means that this pattern doesn’t offer performance isolation at the vector store level, so the largest and most active tenants might impact the experience of other tenants.

When choosing the pool pattern for your RAG architecture, you should be aware that a single ingestion job can ingest or delete a maximum of 100 GB. Additionally, the data source can have a maximum of 5 million documents. If the solution has many tenants that are geographically distributed, consider triggering the ingestion job multiple times a day so you don’t hit the ingestion job size limit. Also, depending on the number and size of your documents to be synchronized, the time for ingestion will be determined by the embedding model invocation rate. For example, consider the following scenario:

This would result in the following:

This means you could trigger an ingestion job 12 times per day to have a good time distribution of data to be ingested. This calculation is a best-case scenario and doesn’t account for the latency introduced by the FM when creating the vector from the chunk. If you expect having to synchronize a large number of tenants at the same time, consider using provisioned throughput to decrease the time it takes to create vector embeddings. This approach will also help distribute the load on the embedding models, limiting throttling of the Amazon Bedrock runtime API calls.

Bridge

The bridge pattern, illustrated in the following figure, strikes a balance between the silo and pool patterns, offering a middle ground that balances tenant data isolation and security.

The bridge pattern delivers the following characteristics:

The bridge pattern supports up to 100 tenants, and onboarding and offboarding a tenant requires the creation and deletion of a knowledge base and OpenSearch Service vector index. To delete the data pertaining to a particular tenant, you can delete the created resources and use the tenant-specific prefix as a logical parameter in your Amazon S3 API calls. Unlike the silo pattern, the bridge pattern doesn’t allow for per-tenant end-to-end encryption; it offers the same level of tenant customization offered by the silo pattern while optimizing costs.

Summary of differences

The following figure and table provide a consolidated view for comparing the characteristics of the different multi-tenant RAG architecture patterns. This comprehensive overview highlights the key attributes and trade-offs associated with the pool, bridge, and silo patterns, enabling informed decision-making based on specific requirements.

The following figure illustrates the mapping of design characteristics to components of the RAG architecture.

The following table summarizes the characteristics of the multi-tenant RAG architecture patterns.

Characteristic Attribute of  Pool Bridge Silo
Per-tenant chunking strategy Amazon Bedrock Knowledge Base Data Source No Yes Yes
Customer managed key for encryption of transient data and at rest Amazon Bedrock Knowledge Base Data Source No No Yes
Per-tenant distance measure Amazon OpenSearch Service Index No Yes Yes
Per-tenant ANN index configuration Amazon OpenSearch Service Index No Yes Yes
Per-tenant data deletion policies Amazon Bedrock Knowledge Base Data Source No Yes Yes
Per-tenant vector size Amazon Bedrock Knowledge Base Data Source No Yes Yes
Tenant performance isolation Vector database No No Yes
Tenant onboarding and offboarding complexity Overall solution Simplest, requires management of new tenants in existing infrastructure Medium, requires minimal management of end-to-end infrastructure Hardest, requires management of end-to-end infrastructure
Query client implementation Original Data Source Medium, requires dynamic filtering Hardest, requires external tenant mapping table Simplest, same as single-tenant implementation
Amazon S3 tenant management complexity Amazon S3 buckets and objects Hardest, need to maintain tenant specific metadata files for each object Medium, each tenant needs a different S3 path Simplest, each tenant requires a different S3 bucket
Cost Vector database Lowest Medium Highest
Per-tenant FM used to create vector embeddings Amazon Bedrock Knowledge Base No Yes Yes

Conclusion

This post explored three distinct patterns for implementing a multi-tenant RAG architecture using Amazon Bedrock Knowledge Bases and OpenSearch Service. The silo, pool, and bridge patterns offer varying levels of tenant isolation, variability, management simplicity, and cost-efficiency, catering to different use cases and requirements. By understanding the trade-offs and considerations associated with each pattern, organizations can make informed decisions and choose the approach that best aligns with their needs.

Get started with Amazon Bedrock Knowledge Bases today.


About the Authors

Emanuele Levi is a Solutions Architect in the Enterprise Software and SaaS team, based in London. Emanuele helps UK customers on their journey to refactor monolithic applications into modern microservices SaaS architectures. Emanuele is mainly interested in event-driven patterns and designs, especially when applied to analytics and AI, where he has expertise in the fraud-detection industry.

Mehran Nikoo is a Generative AI Go-To-Market Specialist at AWS. He leads the generative AI go-to-market strategy for UK and Ireland.

Dani Mitchell is a Generative AI Specialist Solutions Architect at AWS. He is focused on computer vision use case and helps AWS customers in EMEA accelerate their machine learning and generative AI journeys with Amazon SageMaker and Amazon Bedrock.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RAG 多租户 Amazon Bedrock SaaS 数据隔离
相关文章