Build conversational interfaces for structured data using Amazon Bedrock Knowledge Bases

Organizations manage extensive structured data in databases and data warehouses. Large language models (LLMs) have transformed natural language processing (NLP), yet converting conversational queries into structured data analysis remains complex. Data analysts must translate business questions into SQL queries, creating workflow bottlenecks.

Amazon Bedrock Knowledge Bases enables direct natural language interactions with structured data sources. The system interprets database schemas and context, converting natural language questions into accurate queries while maintaining data reliability standards. You can chat with your structured data by setting up structured data ingestion from AWS Glue Data Catalog tables and Amazon Redshift clusters in a few steps, using the power of Amazon Bedrock Knowledge Bases structured data retrieval.

This post provides instructions to configure a structured data retrieval solution, with practical code examples and templates. It covers implementation samples and additional considerations, empowering you to quickly build and scale your conversational data interfaces. Through clear examples and proven methodologies, organizations can transform their data access capabilities and accelerate decision-making processes.

Solution overview

The solution demonstrates how to build a conversational application using Amazon Bedrock Knowledge Bases structured data retrieval. Developers often face challenges integrating structured data into generative AI applications. This includes difficulties training LLMs to convert natural language queries to SQL queries based on complex database schemas, as well as making sure appropriate data governance and security controls are in place. Amazon Bedrock Knowledge Bases alleviates these complexities by providing a managed natural language to SQL (NL2SQL) module. Amazon Bedrock Knowledge Bases offers an end-to-end managed workflow for you to build custom generative AI applications that can access and incorporate contextual information from a variety of structured and unstructured data sources. Using advanced NLP, Amazon Bedrock Knowledge Bases can transform natural language queries into SQL queries, so you can retrieve data directly from the source without the need to move or preprocess the data.

This solution includes Amazon Bedrock Knowledge Bases, Amazon Redshift, AWS Glue, and Amazon Simple Storage Service (Amazon S3). The solution architecture consists of two parts: a data ingestion pipeline, and a structured data retrieval application using Amazon Bedrock Knowledge Bases.

Amazon Bedrock Knowledge Bases structured data retrieval supports Amazon Redshift as the query engine and multiple data ingestion options. The data ingestion pipeline is a one-time setup, and supports multiple ingestion options. In this post, we discuss a common data ingestion use case using Amazon S3, AWS Glue, and Amazon Redshift.

You can configure Amazon Bedrock Knowledge Bases structured data retrieval to retrieve data from AWS Glue databases and S3 datasets. This setup uses automatic mounting of the Data Catalog in Amazon Redshift. With this ingestion option, you can seamlessly integrate existing S3 datasets and Data Catalog tables into your Retrieval Augmented Generation (RAG) applications with the access permissions configured through Lake Formation. The following diagram illustrates this pipeline.

The following screenshot shows the configuration options on the Amazon Bedrock console.

After the data ingestion is configured and the knowledge bases data source sync job is complete, users can ask natural language questions, and Amazon Bedrock Knowledge Bases will generate the SQL, execute the SQL against the query engine, and process it through the LLM to provide a user-friendly response. The following diagram illustrates a sample architecture of the structured data retrieval workflow.

The data retrieval workflow consists of the following steps:

In a RAG application, the user can ask a natural language data analytics question through the chat interface, such as “What is the sales revenue for the Month of February 2025?” The natural language query is sent to Amazon Bedrock Knowledge Bases for data retrieval and processing. Amazon Bedrock Knowledge Bases generates a SQL query based on the underlying data schema configured during the knowledge base creation. The SQL query is executed against the query engine (Amazon Redshift) to retrieve data from a structured data store (AWS Glue tables). The query can include multiple joins and aggregation. The generated SQL response is sent to an LLM along with additional context to generate a response in natural language. The response is sent back to the user. The user can ask follow-up questions based on the retrieved response, such as “What is the product that generated highest revenue in this period?”

Amazon Bedrock Knowledge Bases structured data retrieval supports three different APIs to meet your data retrieval requirements:

Retrieval and response generation

Retrieval only

Generate SQL queries

The following screenshot shows the configuration options on the Amazon Bedrock console.

Code resources and templates

The solution uses the following notebooks:

Data ingestion notebook

Structured-rag-s3-glue-ingestion

Structured data retrieval notebook

Structured-rag-s3-glue-retrieval

For more details, refer to the GitHub repo.

Prerequisites

To implement the solution provided in this post, you must have an AWS account. Additionally, access to the required foundation models must be enabled in Amazon Bedrock.

Set up the data ingestion pipeline

To set up the data ingestion pipeline, we load the sample dataset in an S3 bucket and configure AWS Glue as data storage and a Redshift Serverless workgroup as the query engine. Complete the following steps in data ingestion notebook:

sample ecommerce dataset

Amazon SageMaker Data Wrangler

AWS Glue crawler

default VPC

Set up the structured data retrieval solution

In this section, we detail the steps to set up the structured data retrieval component of the solution.

Amazon Bedrock Knowledge Bases supports multiple data access patterns, including AWS Identity and Access Management (IAM), AWS Secrets Manager, and database users. For this post, we demonstrate the setup option with IAM access. You can use IAM access with the Redshift Serverless workgroup configured as part of the ingestion workflow or an existing Redshift Serverless or provisioned cluster to compete these steps.

Complete the following steps in structured data retrieval notebook:

CreateKnowledgeBase API

complete additional steps

AWS Lake Formation

For more details, refer to Create a knowledge base by connecting to a structured data store.

Clean up

We have included cleanup instructions in both the data ingestion and structured data retrieval notebooks to clean up resources after the end-to-end solution is implemented and validated.

Conclusion

Amazon Bedrock Knowledge Bases simplifies data analysis by converting natural language questions into SQL queries, eliminating the need for specialized database expertise. The service integrates with Amazon Redshift, AWS Glue, and Amazon S3, allowing business analysts, data scientists, and operations teams to query data directly using conversation-like questions. It maintains data security through built-in governance controls and access permissions. Customers can deploy this managed service to enable users to analyze data using natural language questions, while maintaining data integrity and security standards.

To learn more, refer to Build a knowledge base by connecting to a structured data store and Amazon Bedrock Knowledge Bases now supports structured data retrieval.

About the authors

George Belsian is a Senior Cloud Application Architect at Amazon Web Services, helping organizations navigate the complexities of cloud adoption, AI integration, and data-driven innovation. By transforming legacy systems into cloud-based platforms and incorporating AI/ML capabilities, he helps businesses create new opportunities for growth, optimize their processes, and deliver scalable solutions.

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, machine learning, and system design. He has successfully delivered state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Mani Khanuja is a Principal Generative AI Specialist SA and author of the book Applied Machine Learning and High-Performance Computing on AWS. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Gopikrishnan Anilkumar is a Principal Technical Product Manager in AWS Agentic AI organization. He has over 10 years of product management experience across a variety of domains and is passionate about AI/ML.

Solution overview

Code resources and templates

Prerequisites

Set up the data ingestion pipeline

Set up the structured data retrieval solution

Clean up

Conclusion

About the authors

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签