AWS Machine Learning Blog 2024年07月25日
Discover insights from Amazon S3 with Amazon Q S3 connector 
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何使用 Amazon Q Business 和 ACL 构建安全的搜索应用程序,该应用程序可以从 Amazon S3 中提取信息。Amazon Q Business 允许用户使用自然语言查询从 S3 中存储的内容中获取信息,同时通过 ACL 保护数据安全。本文还介绍了 ACL 爬取和身份爬取的概念,以及如何使用这些技术来确保只有授权用户才能访问相关文档。

😁 **Amazon Q Business:** Amazon Q Business 是一种完全托管的、由生成式人工智能 (AI) 提供支持的助手,可以根据企业系统中的数据和信息回答问题、提供摘要、生成内容并安全地完成任务。它提供本地数据源连接器,可以将内容索引到内置检索器中,并使用大型语言模型 (LLM) 提供准确且措辞良好的答案。

😊 **ACL 爬取:** Amazon Q Business 支持 ACL 爬取,这意味着它可以识别文档的 ACL,并确保只有授权用户才能访问这些文档。ACL 存储在每个文档的元数据文件中,其中包含用户电子邮件地址、本地组和其他相关信息。

😉 **身份爬取:** 当用户查询 Amazon Q Business 时,他们的身份凭据(例如电子邮件地址)会与查询一起传递,以便 Amazon Q Business 可以确定他们是否有权访问相关文档。如果用户的身份凭据在 ACL 中找不到,Amazon Q Business 会尝试将这些凭据映射到文档 ACL 中的本地别名和本地组。

😎 **解决方案概述:** 要设置生成式 AI 聊天应用程序,管理员用户需要创建 Amazon Q 应用程序、连接到不同的数据源,最后部署 Web 体验。Amazon Q Web 体验是使用 Amazon Q 应用程序创建的聊天界面。用户可以与组织的 Amazon Q Web 体验聊天,并且可以将其与 IAM Identity Center 集成。

😇 **架构图:** 本文还提供了一个架构图,展示了 Amazon Q Business、Amazon S3 和 IAM Identity Center 之间的交互方式。该图说明了如何将 ACL 信息从 S3 传递到 Amazon Q Business,以及如何使用身份验证来确保数据安全。

🥰 **先决条件:** 本文还提供了构建安全搜索应用程序所需的先决条件,包括 AWS 帐户、Amazon S3 和 IAM Identity Center 权限,以及创建 Amazon Q 应用程序、AWS 资源和 AWS Identity and Access Management (IAM) 角色和策略的权限。

😍 **准备 S3 存储桶作为数据源:** 本文详细介绍了如何准备 S3 存储桶作为数据源,包括如何将文档上传到存储桶、如何组织文档以及如何使用 ACL 来控制对文档的访问。

🤩 **安全查询:** 本文讨论了安全查询的概念,以及 Amazon Q Business 如何使用 ACL 和身份爬取来确保只有授权用户才能访问相关文档。

🥳 **使用 ACL 限制对文档的访问:** 本文还介绍了如何使用 ACL 来限制对文档的访问,以及如何将 ACL 与 Amazon Q Business 集成,以确保只有授权用户才能访问相关内容。

🤪 **Amazon Q Business 的优势:** Amazon Q Business 提供了一种简单、安全且可扩展的方式来构建安全的搜索应用程序,该应用程序可以从 Amazon S3 中提取信息。它还支持多种数据源连接器,并提供 ACL 爬取和身份爬取等功能,以确保数据安全。

😡 **结论:** 本文展示了如何使用 Amazon Q Business 和 ACL 构建安全的搜索应用程序,该应用程序可以从 Amazon S3 中提取信息。这有助于企业构建可信赖的、安全的搜索应用程序,以满足其独特的业务需求。

Amazon Q is a fully managed, generative artificial intelligence (AI) powered assistant that you can configure to answer questions, provide summaries, generate content, gain insights, and complete tasks based on data in your enterprise. The enterprise data required for these generative-AI powered assistants can reside in varied repositories across your organization. One common repository to store data is Amazon Simple Storage Service (Amazon S3), which is an object storage service that stores data as objects within storage buckets. Customers of all sizes and industries can securely index data from a variety of data sources such as document repositories, web sites, content management systems, customer relationship management systems, messaging applications, database, and so on.

To build a generative AI-based conversational application that’s integrated with the data sources that contain the relevant content an enterprise needs to invest time, money, and people, you need to build connectors to the data sources. Next you need to index the data to make it available for a Retrieval Augmented Generation (RAG) approach where relevant passages are delivered with high accuracy to a large language model (LLM). To do this you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve the data, rank the answers, and build a feature rich web application. You also need to hire and staff a large team to build, maintain and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories, code, and enterprise systems such as Atlassian Jira and others. To do this, Amazon Q provides native data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well written answers. A data source connector within Amazon Q helps to integrate and synchronize data from multiple repositories into one index.

Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including Atlassian Jira, Atlassian Confluence, Amazon S3, Microsoft SharePoint, Salesforce, and many more and can help you create your generative AI solution with minimal configuration. For a full list of Amazon Q supported data source connectors, see Amazon Q connectors.

Now you can use the Amazon Q S3 connector to index your data on S3 and build a generative AI assistant that can derive insights from the data stored. Amazon Q generates comprehensive responses to natural language queries from users by analyzing information across content that it has access to. Amazon Q also supports access control for your data so that the right users can access the right content. Its responses to questions are based on the content that your end user has permissions to access.

This post shows how to configure the Amazon Q S3 connector and derive insights by creating a generative-AI powered conversation experience on AWS using Amazon Q while using access control lists (ACLs) to restrict access to documents based on user permissions.

Finding accurate answers from content in S3 using Amazon Q Business

After you integrate Amazon Q Business with Amazon S3, users can ask questions about the content stored in S3. For example, a user might ask about the main points discussed in a blog post on cloud security, the installation steps outlined in a user guide, findings from a case study on hybrid cloud usage, market trends noted in an analyst report, or key takeaways from a whitepaper on data encryption. This integration helps users to quickly find the specific information they need, improving their understanding and ability to make informed business decisions.

Secure querying with ACL crawling and identity crawling

Secure querying is when a user runs a query and is returned answers from documents that the user has access to and not from documents that the user does not have access to. To enable users to do secure querying, Amazon Q Business honors ACLs of the documents. Amazon Q Business does this by first supporting the indexing of ACLs. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are treated as public. Second, at query time the user’s credentials (email address) are passed along with the query so that only answers from documents that are relevant to the query and that the user is authorized to access are displayed.

A document’s ACL, included in the metadata.json or acl.json files alongside the document in the S3 bucket, contains details such as the user’s email address and local groups.

When a user signs in to a web application to conduct a search, their credentials (such as an email address) need to match what’s in the ACL of the document to return results from that document. The web application that the user uses to retrieve answers would be connected to an identity provider (IdP) or the AWS IAM Identity Center. The user’s credentials from the IdP or IAM Identity Center are referred to here as the federated user credentials. The federated user credentials are passed along with the query so that Amazon Q can return the answers from the documents that this user has access to. However, there are occasions when a user’s federated credentials might be absent from the S3 bucket ACLs. In these instances, only the user’s local alias and local groups are specified in the document’s ACL. Therefore, it’s necessary to map these federated user credentials to the corresponding local user alias and local group in the document’s ACL.

Any document or folder without an explicit ACL Deny clause is treated as public.

Solution overview

As an administrator user of Amazon Q, the high-level steps to set up a generative AI chat application are to create an Amazon Q application, connect to different data sources, and finally deploy your web experience. An Amazon Q web experience is the chat interface that you create using your Amazon Q application. Then, your users can chat with your organization’s Amazon Q web experience, and it can be integrated with IAM Identity Center. You can configure and customize your Amazon Q web experience using either the AWS Management Console for Amazon Q or the Amazon Q API.

Amazon Q understands and respects your existing identities, roles, and permissions and uses this information to personalize its interactions. If a user doesn’t have permission to access data without Amazon Q, they can’t access it using Amazon Q either. The following table outlines which documents each user is authorized to access for our use case. The documents being used in this example are a subset of AWS public documents. In this blog post, we will focus on users Arnav (Guest), Mary, and Pat and their assigned groups.

First name Last name Group Document type authorized for access
1 Arnav Desai Blogs
2 Pat Candella Customer Blogs, user guides
3 Jane Doe Sales Blogs, user guides, and case studies
4 John Stiles Marketing Blogs, user guides, case studies, and analyst reports
5 Mary Major Solutions architect Blogs, user guides, case studies, analyst reports, and whitepapers

Architecture diagram

The following diagram illustrates the solution architecture. Amazon S3 is the data source and documents along with the ACL information are passed to Amazon Q from S3. The user submits a query to the Amazon Q application. Amazon Q retrieves the user and group information and provides answers based on the documents that the user has access to.

In the upcoming sections, we will show you how to implement this architecture.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Prepare your S3 bucket as a data source

In the AWS Region list, choose US East (N. Virginia) as the Region. You can choose any Region that Amazon Q is available in but ensure that you remain in the same Region when creating all other resources. To prepare an S3 bucket as a data source, create an S3 bucket. Note the name of the S3 bucket. Replace <REPLACE-WITH-NAME-OF-S3-BUCKET> with the name of the bucket in the commands below. In a terminal with the AWS Command Line Interface (AWS CLI) or AWS CloudShell, run the following commands to upload the documents to the data source bucket:

aws s3 cp s3://aws-ml-blog/artifacts/building-a-secure-search-application-with-access-controls-kendra/docs.zip .unzip docs.zipaws s3 cp Data/ s3://<REPLACE-WITH-NAME-OF-S3-BUCKET>/Data/ --recursiveaws s3 cp Meta/ s3://<REPLACE-WITH-NAME-OF-S3-BUCKET>/Meta/ --recursive

The documents being queried are stored in an S3 bucket. Each document type has a separate folder: blogs, case-studies, analyst reports, user guides, and white papers. This folder structure is contained in a folder named Data as shown below:

Each object in S3 is considered a single document. Any <object-name>.metadata.json file and access control list (ACL) file is considered metadata for the object it’s associated with and not treated as a separate document. In this example, metadata files including the ACLs are in a folder named Meta. We use the Amazon Q S3 connector to configure this S3 bucket as the data source. When the data source is synced with the Amazon Q index, it crawls and indexes all documents and collects the ACLs and document attributes from the metadata files. To learn more about ACLs using metadata files, see Amazon S3 document metadata. Here’s the sample metadata JSON file:

{   "Attributes": {      "DocumentType": "user-guides"   },   "AccessControlList": [      { "Access": "ALLOW", "Name": "customer", "Type": "GROUP" },      { "Access": "ALLOW", "Name": "AWS-Sales", "Type": "GROUP" },      { "Access": "ALLOW", "Name": "AWS-Marketing", "Type": "GROUP" },      { "Access": "ALLOW", "Name": "AWS-SA", "Type": "GROUP" }   ]}

Create users and groups in IAM Identity Center

In this section, you create the following mapping for demonstration:

User Group name
1 Arnav
2 Pat customer
3 Mary AWS-SA

To create users:

    Open the AWS IAM Identity Center If you haven’t enabled IAM Identity Center, choose Enable. If there’s a pop-up, choose how you want to enable IAM Identity Center. For this example, select Enable only in this AWS account. Choose Continue. In the IAM Identity Center dashboard, choose Users in the navigation pane. Choose Add User. Enter the user details for Mary:
      Username: mary_major Email address: mary_major@example.com
      Note: Use or create a real email address for each user to use in a later step. First name: Mary Last name: Major Display name: Mary Major
    Skip the optional fields and choose Next to create the user. In the Add user to groups page, choose Next and then choose Add user. Follow the same steps to create users for Pat and Arnav (Guest user).
    (You will assign users to groups at a later step.)

To create groups:

    Now, you will create two groups: AWS-SA and customer. Choose Groups on the navigation pane and choose Create group.

    For the group name, enter AWS-SA, add user Mary to the group,and choose Create group. Similarly, create a group name customer, add user Pat, and choose Create group. Now, add multi-factor authentication to the users following the instructions sent to the user email. For more details, see Multi-factor authentication for Identity Center users. When done, you will have the users and groups set up on IAM Identity Center.

Create and configure your Amazon Q application

In this step, you create an Amazon Q application that powers the conversation web experience:

    On the AWS Management Console for Amazon Q, in the Region list, choose US East (N. Virginia). On the Getting started page, select Enable identity-aware sessions. Once enabled, Amazon Q connected to IAM Identity Center should be displayed. Choose Subscribe in Q Business. On the Amazon Q Business console, choose Get started. On the Applications page, choose Create application. On the Create application page, enter Application name and leave everything else with default values.  Choose Create. On the Select retriever page, for Retrievers, select Use native retriever. Choose Next. This will take you to the Connect data sources

Configure Amazon S3 as the data source

In this section, you walk through an example of adding an S3 connector. The S3 connector consists of blogs, user guides, case studies, analyst reports, and whitepapers.

To add the S3 connector:

    On the Connect data sources page, select Amazon S3 connector. For Data source name, enter a name for your data source. In the IAM role section, select Create new service role (Recommended).
    In Sync scope section, browse to your S3 bucket containing the data files. Under Advanced settings, for Metadata files prefix folder location, enter Meta/ Choose Filter patterns. Under Include patterns, enter Data/ as the prefix and choose Add. For Frequency under Sync run schedule, choose Run on demand. Leave the rest as default and choose Add data source. Wait until the data source is added. On the Connect data sources page, choose Next. This will take you to the Add users and groups

Add users and groups in Amazon Q

In this section, you set up users and groups to showcase how access can be managed based on the permissions.

    On the Add users and groups page, choose Assign existing users and groups and choose Next. Enter the users and groups you want to add and choose Assign. You will have to enter the user names and groups in the search box and select the user or group. Verify that users and groups are correctly displayed under the Users and Groups tabs respectively.
    Select the Current subscription. In this example, we selected choose Q Business Lite for groups. Choose the same subscription for users under the Users tab. You can also update subscriptions after creating the application. Leave the Service role name as default and choose Create application.

Sync S3 data source

With your application created, you will crawl and index the documents in the S3 bucket created at the beginning of the process.

    Select the name of the application

    Go to the Data sources Select the radio button next to the S3 data source and choose Sync now.

    The sync can take from a few minutes to a few hours. Wait for the sync to complete. Verify the sync is complete and documents have been added.

Run queries with Amazon Q

Now that you have configured the Amazon Q application and integrated it with IAM Identity Center, you can test queries from different users based on their group permissions. This will demonstrate how Amazon Q respects the access control rules set up in the Amazon S3 data source.

You have three users for testing—Pat from the Customer group, Mary from the AWS-SA group, and Arnav who isn’t part of any group. According to the access control list (ACL) configuration, Pat should have access to blogs and user guides, Mary should have access to blogs, user guides, case studies, analyst reports, and whitepapers, and Arnav should have access only to blogs.

In the following steps, you will sign in as each user and ask various questions to see what responses Amazon Q provides based on the permitted document types for their respective groups. You will also test edge cases where users try to access information from restricted sources to validate the access control functionality.

Sign in as Pat to the Amazon Q chat interface.

Pat is part of the Customer group and has access to blogs and user guides

When asked a question like “What is AWS?” Amazon Q will provide a summary pulling information from blogs and user guides, highlighting the sources at the end of each excerpt.

Try asking a question that requires information from user guides, such as “How do I set up an AWS account?” Amazon Q will summarize relevant details from the permitted user guide sources for Pat’s group.

However, if you, as Pat, ask a question that requires information from whitepapers, analyst reports, or case studies, Amazon Q will indicate that it could not find any relevant information from the sources she has access to.

Ask a question such as “What are the strategic planning assumptions for the year 2025?” to see this.

Sign in as Mary to the Amazon Q chat interface.

Sign out as user Pat. Start a new incognito browser session or use a different browser. Copy the web experience URL and sign in as user Mary. Repeat these steps each time you need to sign in as a different user.

Mary is part of the AWS-SA group, so she has access to blogs, case studies, analyst reports, and whitepapers.

When Mary asks the same question about strategic planning, Amazon Q will provide a comprehensive summary pulling information from all the permitted sources.

With Mary’s sign-in, you can ask various other questions related to AWS services, architectures, or solutions, and Amazon Q will effectively summarize information from across all the content types Mary’s group has access to.

Sign in as Arnav to the Amazon Q chat interface

Arnav is not part of any group and is able to access only blogs. If Arnav asks a question about Amazon Polly, Amazon Q will return blog posts.

When Arnav tries to get information from the user guides, access is restricted. If they ask about something like how to set up an AWS account, Amazon Q responds that it could not find relevant information.

This shows how Amazon Q respects the data access rules configured in the Amazon S3 data source, allowing users to gain insights only from the content their group has permissions to view, while still providing comprehensive answers when possible within those boundaries.

Troubleshooting

Troubleshooting your Amazon S3 connector provides information about error codes you might see for the Amazon S3 connector and suggested troubleshooting actions. If you encounter an HTTP status code 403 (Forbidden) error when you open your Amazon Q Business application, it means that the user is unable to access the application. See Troubleshooting Amazon Q Business and identity provider integration for common causes and how to address them.

Frequently asked questions

Q. Why isn’t Amazon Q Business answering any of my questions?

A. Verify that you have synced your data source on the Amazon Q console. Also, check the ACLs to ensure you have the required permissions to retrieve answers from Amazon Q.

Q. How can I sync documents without ACLs?

A. When configuring the Amazon S3 connector, under Sync scope, you can optionally choose not to include the metadata or ACL configuration file location in Advanced settings. This will allow you to sync documents without ACLs.

Q. I updated the contents of my S3 data source but Amazon Q business answers using old data.

A. After content has been updated in your S3 data source location, you must re-sync the contents for the updated data to be picked up by Amazon Q. Go to the Data sources Select the radio button next to the S3 data source and choose Sync now. After the sync is complete, verify that the updated data is reflected by running queries on Amazon Q.

Q. I am unable to sign in as a new user through the web experience URL.

A. Clear your browser cookies and sign in as a new user.

Q. I keep trying to sign in but am getting this error:

A. Try signing in from a different browser or clear browser cookies and try again.

Q. What are the supported document formats and what is considered a document in Amazon S3?

A. See Supported document types and What is a document? to learn more.

Call to action

Explore other features in Amazon Q Business such as:

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created: the Amazon Q application, data sources, and corresponding IAM roles.

    To delete the Amazon Q application, go to the Amazon Q console and, on the Applications page, select your application. On the Actions drop-down menu, choose Delete. To confirm deletion, enter delete in the field and choose Delete. Wait until you get the confirmation message; the process can take up to 15 minutes. To delete the S3 bucket created in Prepare your S3 bucket as a data source, empty the bucket and then follow the steps to delete the bucket. Delete your IAM Identity Center instance.

Conclusion

This blog post has walked you through the steps to build a secure, permissions-based generative AI solution using Amazon Q and Amazon S3 as the data source. By configuring user groups and mapping their access privileges to different document folders in S3, it demonstrated that Amazon Q respects these access control rules. When users query the AI assistant, it provides comprehensive responses by analyzing only the content their group has permission to view, preventing unauthorized access to restricted information. This solution allows organizations to safely unlock insights from their data repositories using generative AI while ensuring data access governance.

Don’t let your data’s potential go untapped. Continue exploring how Amazon Q can transform your enterprise data to gain actionable insights. Join the conversation and share your thoughts or questions in the comments section below.


About the Author

Kruthi Jayasimha Rao is a Partner Solutions Architect with a focus in AI and ML. She provides technical guidance to AWS Partners in following best practices to build secure, resilient, and highly available solutions in the AWS Cloud.


Keagan Mirazee is a Partner Solutions Architect specializing in Generative AI to assist AWS Partners in engineering reliable and scalable cloud solutions.


Dipti Kulkarni is a Sr. Software Development Engineer for Amazon Q. Dipti is a passionate engineer building connectors for Amazon Q.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Q Business ACL 安全搜索 S3 生成式 AI
相关文章