AWS Machine Learning Blog 6小时前
Build conversational interfaces for structured data using Amazon Bedrock Knowledge Bases
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon Bedrock Knowledge Bases 简化了数据分析流程,允许用户通过自然语言与结构化数据交互。该服务将自然语言问题转换为 SQL 查询,消除了对专业数据库知识的需求。它与 Amazon Redshift、AWS Glue 和 Amazon S3 集成,使业务分析师、数据科学家和运营团队能够直接使用类似对话的方式查询数据。文章详细介绍了如何配置结构化数据检索解决方案,提供了代码示例和模板,帮助用户快速构建和扩展对话式数据接口。

💡Amazon Bedrock Knowledge Bases 能够将自然语言查询转换为 SQL 查询,简化了与结构化数据的交互,无需专业数据库知识。

⚙️该解决方案的核心组件包括 Amazon Bedrock Knowledge Bases、Amazon Redshift、AWS Glue 和 Amazon S3,构建了一个完整的数据检索流程。

🔑数据检索工作流程包括:用户提出自然语言问题、Amazon Bedrock Knowledge Bases 生成 SQL 查询、SQL 查询在查询引擎上执行、将结果发送给 LLM 生成自然语言响应,最终返回给用户。

🛠️文章提供了详细的设置步骤,包括数据摄取管道的配置和结构化数据检索解决方案的设置,并提供了代码资源和模板。

✅Amazon Bedrock Knowledge Bases 支持三种 API:检索和生成响应、仅检索、生成 SQL 查询,以满足不同的数据检索需求。

Organizations manage extensive structured data in databases and data warehouses. Large language models (LLMs) have transformed natural language processing (NLP), yet converting conversational queries into structured data analysis remains complex. Data analysts must translate business questions into SQL queries, creating workflow bottlenecks.

Amazon Bedrock Knowledge Bases enables direct natural language interactions with structured data sources. The system interprets database schemas and context, converting natural language questions into accurate queries while maintaining data reliability standards. You can chat with your structured data by setting up structured data ingestion from AWS Glue Data Catalog tables and Amazon Redshift clusters in a few steps, using the power of Amazon Bedrock Knowledge Bases structured data retrieval.

This post provides instructions to configure a structured data retrieval solution, with practical code examples and templates. It covers implementation samples and additional considerations, empowering you to quickly build and scale your conversational data interfaces. Through clear examples and proven methodologies, organizations can transform their data access capabilities and accelerate decision-making processes.

Solution overview

The solution demonstrates how to build a conversational application using Amazon Bedrock Knowledge Bases structured data retrieval. Developers often face challenges integrating structured data into generative AI applications. This includes difficulties training LLMs to convert natural language queries to SQL queries based on complex database schemas, as well as making sure appropriate data governance and security controls are in place. Amazon Bedrock Knowledge Bases alleviates these complexities by providing a managed natural language to SQL (NL2SQL) module. Amazon Bedrock Knowledge Bases offers an end-to-end managed workflow for you to build custom generative AI applications that can access and incorporate contextual information from a variety of structured and unstructured data sources. Using advanced NLP, Amazon Bedrock Knowledge Bases can transform natural language queries into SQL queries, so you can retrieve data directly from the source without the need to move or preprocess the data.

This solution includes Amazon Bedrock Knowledge Bases, Amazon Redshift, AWS Glue, and Amazon Simple Storage Service (Amazon S3). The solution architecture consists of two parts: a data ingestion pipeline, and a structured data retrieval application using Amazon Bedrock Knowledge Bases.

Amazon Bedrock Knowledge Bases structured data retrieval supports Amazon Redshift as the query engine and multiple data ingestion options. The data ingestion pipeline is a one-time setup, and supports multiple ingestion options. In this post, we discuss a common data ingestion use case using Amazon S3, AWS Glue, and Amazon Redshift.

You can configure Amazon Bedrock Knowledge Bases structured data retrieval to retrieve data from AWS Glue databases and S3 datasets. This setup uses automatic mounting of the Data Catalog in Amazon Redshift. With this ingestion option, you can seamlessly integrate existing S3 datasets and Data Catalog tables into your Retrieval Augmented Generation (RAG) applications with the access permissions configured through Lake Formation. The following diagram illustrates this pipeline.

The following screenshot shows the configuration options on the Amazon Bedrock console.

After the data ingestion is configured and the knowledge bases data source sync job is complete, users can ask natural language questions, and Amazon Bedrock Knowledge Bases will generate the SQL, execute the SQL against the query engine, and process it through the LLM to provide a user-friendly response. The following diagram illustrates a sample architecture of the structured data retrieval workflow.

The data retrieval workflow consists of the following steps:

    In a RAG application, the user can ask a natural language data analytics question through the chat interface, such as “What is the sales revenue for the Month of February 2025?” The natural language query is sent to Amazon Bedrock Knowledge Bases for data retrieval and processing. Amazon Bedrock Knowledge Bases generates a SQL query based on the underlying data schema configured during the knowledge base creation. The SQL query is executed against the query engine (Amazon Redshift) to retrieve data from a structured data store (AWS Glue tables). The query can include multiple joins and aggregation. The generated SQL response is sent to an LLM along with additional context to generate a response in natural language. The response is sent back to the user. The user can ask follow-up questions based on the retrieved response, such as “What is the product that generated highest revenue in this period?”

Amazon Bedrock Knowledge Bases structured data retrieval supports three different APIs to meet your data retrieval requirements:

The following screenshot shows the configuration options on the Amazon Bedrock console.

Code resources and templates

The solution uses the following notebooks:

For more details, refer to the GitHub repo.

Prerequisites

To implement the solution provided in this post, you must have an AWS account. Additionally, access to the required foundation models must be enabled in Amazon Bedrock.

Set up the data ingestion pipeline

To set up the data ingestion pipeline, we load the sample dataset in an S3 bucket and configure AWS Glue as data storage and a Redshift Serverless workgroup as the query engine. Complete the following steps in data ingestion notebook:

    For data ingestion, download the following sample ecommerce dataset, convert it to a pandas data frame, and upload it to an S3 bucket using Amazon SageMaker Data Wrangler. Create an AWS Glue database and table using an AWS Glue crawler by crawling the source S3 bucket with the dataset. You can update this step to crawl your own S3 bucket or use your existing Data Catalog tables as storage metadata. Use the data ingestion notebook to create a Redshift Serverless namespace and workgroup in the default VPC. If you plan to use your own Redshift Serverless workgroup or Amazon Redshift provisioned cluster, you can skip this step.

Set up the structured data retrieval solution

In this section, we detail the steps to set up the structured data retrieval component of the solution.

Amazon Bedrock Knowledge Bases supports multiple data access patterns, including AWS Identity and Access Management (IAM), AWS Secrets Manager, and database users. For this post, we demonstrate the setup option with IAM access. You can use IAM access with the Redshift Serverless workgroup configured as part of the ingestion workflow or an existing Redshift Serverless or provisioned cluster to compete these steps.

Complete the following steps in structured data retrieval notebook:

    Create an execution role with the necessary policies for accessing data from Amazon Redshift, AWS Glue, and the S3 bucket. Invoke the CreateKnowledgeBase API to create the knowledge base with the execution role and knowledge base configurations. In the knowledge base configuration, the AWS Glue database and tables are used as storage metadata with Amazon Redshift as the query engine. After you create the knowledge base, you must complete additional steps to make sure the IAM execution role has the necessary permissions to execute the query in Amazon Redshift and retrieve data from AWS Glue. The notebook includes the necessary instructions to create and grant database access to the execution role, and grant AWS Lake Formation permissions. The ingestion job will sync the data store schema metadata about AWS Glue database and tables with the NL2SQL module. This schema metadata will be used while generating the SQL query during structured data retrieval. After the knowledge base sync job is complete, you can use the three data retrieval APIs – retrieve and generate response, retrieval only, and generate SQL query – to query and validate the structured data retrieval solution.

For more details, refer to Create a knowledge base by connecting to a structured data store.

Clean up

We have included cleanup instructions in both the data ingestion and structured data retrieval notebooks to clean up resources after the end-to-end solution is implemented and validated.

Conclusion

Amazon Bedrock Knowledge Bases simplifies data analysis by converting natural language questions into SQL queries, eliminating the need for specialized database expertise. The service integrates with Amazon Redshift, AWS Glue, and Amazon S3, allowing business analysts, data scientists, and operations teams to query data directly using conversation-like questions. It maintains data security through built-in governance controls and access permissions. Customers can deploy this managed service to enable users to analyze data using natural language questions, while maintaining data integrity and security standards.

To learn more, refer to Build a knowledge base by connecting to a structured data store and Amazon Bedrock Knowledge Bases now supports structured data retrieval.


About the authors

George Belsian is a Senior Cloud Application Architect at Amazon Web Services, helping organizations navigate the complexities of cloud adoption, AI integration, and data-driven innovation. By transforming legacy systems into cloud-based platforms and incorporating AI/ML capabilities, he helps businesses create new opportunities for growth, optimize their processes, and deliver scalable solutions.

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, machine learning, and system design. He has successfully delivered state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Mani Khanuja is a Principal Generative AI Specialist SA and author of the book Applied Machine Learning and High-Performance Computing on AWS. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Gopikrishnan Anilkumar is a Principal Technical Product Manager in AWS Agentic AI organization. He has over 10 years of product management experience across a variety of domains and is passionate about AI/ML.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock 自然语言查询 结构化数据 数据分析
相关文章