AWS Machine Learning Blog 19小时前
How Nippon India Mutual Fund improved the accuracy of AI assistant responses using advanced RAG methods on Amazon Bedrock
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了Nippon Life India Asset Management Ltd.如何通过优化检索增强生成(RAG)技术,显著提升AI助手的响应准确性。文章指出,传统的RAG方法在处理大量文档时存在准确性、复杂结构解析及回答复杂问题等挑战。为解决这些问题,Nippon Life采用了包括Amazon Bedrock Knowledge Bases、语义分块、查询重写、多查询RAG和结果重排序等高级RAG方法。此外,文章还探讨了通过Amazon Textract进行复杂文档解析、利用LLM拆分复合查询以及评估RAG模型性能等关键技术,为企业级AI助手应用提供了宝贵的实践经验。

🗂️ **传统RAG的局限性与Nippon Life的解决方案**:文章指出了传统RAG在处理海量文档时,由于结果数量限制、复杂结构解析困难、数据利用率低以及难以回答复合问题等方面的不足。Nippon Life通过引入Amazon Bedrock Knowledge Bases,并结合语义分块、多查询RAG和结果重排序等高级RAG方法,有效解决了这些问题,显著提高了响应的准确性和相关性。

📄 **复杂文档解析与分块策略**:针对表格、图像和图表等复杂文档结构,Nippon Life利用Amazon Textract将文档解析为Markdown格式,并结合LLM进行解析,以保留更多上下文信息。在分块策略上,Nippon Life选择了语义分块,这种方法能够生成更具意义且完整的文本块,从而提升检索质量,优于传统的固定分块方式。

❓ **提升复杂查询处理能力**:为了应对用户提出的复杂问题,Nippon Life采用了查询重写和多查询RAG技术。通过LLM将复合查询拆分成多个子查询,并行执行并聚合结果,再进行重排序,最终由LLM生成更全面、更深入的回答。这种方法虽然会增加响应时间,但能显著提高答案的质量和深度。

🔄 **结果重排序与评估**:在处理大量文档时,向量数据库的相似度排序可能不总是最相关的。Nippon Life通过引入重排序模型,根据上下文、相关性等多个维度对检索结果进行二次评分和排序,确保用户获得最相关的上下文信息。同时,文章强调了RAG评估的重要性,并提到了Amazon Bedrock Knowledge Bases内置的质量和负责任AI指标,Nippon Life也利用这些工具来验证其解决方案的有效性。

This post is co-written with Abhinav Pandey from Nippon Life India Asset Management Ltd.

Accurate information retrieval through generative AI-powered assistants is a popular use case for enterprises. To reduce hallucination and improve overall accuracy, Retrieval Augmented Generation (RAG) remains the most commonly used method to retrieve reliable and accurate responses that use enterprise data when responding to user queries. RAG is used for use cases such as AI assistants, search, real-time insights, and improving overall content quality by using the relevant data to generate the response, thereby reducing hallucinations.

Amazon Bedrock Knowledge Bases provides a managed RAG experience that can be used for many use cases. Amazon Bedrock Knowledge Bases is a fully managed service that does the heavy lifting of implementing a RAG pattern—including data ingestion, data chunking, data embedding, and query matching. Amazon Bedrock offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock Knowledge Bases, you can create a RAG solution quickly and seamlessly.

However, in a large enterprise scenario with a large number of relevant documents, the final response is generated based on only the top 5 or top 10 results provided by the database. Because RAG uses a similarity match and not an exact match, there is a possibility that the most relevant result won’t be in the top results returned by the database. In such scenarios, the regular RAG pattern might not be highly accurate.

In this post, we examine a solution adopted by Nippon Life India Asset Management Limited that improves the accuracy of the response over a regular (naive) RAG approach by rewriting the user queries and aggregating and reranking the responses. The proposed solution uses enhanced RAG methods such as reranking to improve the overall accuracy.

Limitations and workarounds in naive RAG for a large volume of documents

The following are the primary limitations with naive RAG when using it with a large volume of documents:

To address these challenges, developers usually adopt a combination of the following workarounds:

Solution overview

In this section, we review the basics of RAG, Amazon Bedrock Knowledge Bases, and advanced RAG methods to address the preceding challenges. The following table details the specific solution components adopted by Nippon to overcome the challenges we discussed in the previous section.

Naive RAG Challenges How Nippon Addressed These Challenges
Lower accuracy due to high volume of documents Use Amazon Bedrock Knowledge Bases with advanced RAG methods, including semantic chunking, multi-query RAG, and results reranking
Parsing complex document structure such as nested tables and graphs within documents Use Amazon Textract to parse the documents into markdown files
Handling compound questions Use query reformulation and results reranking to get the relevant results
Retrieving current context from the documents Use semantic chunking in Amazon Bedrock Knowledge Bases
Generating grounded responses without any hallucination Use Amazon Bedrock Knowledge Bases RAG evaluation

In the next few sections, we detail each of the solution components. We start with an overview of RAG and then discuss Amazon Bedrock Knowledge Bases. We then discuss advanced RAG methods adopted by Nippon, including advanced parsing, query reformulation, multi-query RAG, and results reranking.

RAG overview

The RAG pattern for this solution has two primary workflows: a data ingestion workflow and a text generation phase, as depicted in the following figure. During the ingestion phase, the solution chunks the content of the source documents, creates embeddings for the created chunks, and stores them in a vector database. In the text generation workflow, the user query is converted to a vector embedding and the query embedding is compared with embeddings stored in the vector database. The database provides the top results that are close to the user query. The solution then sends the user query along with the retrieved top results as the context to the FM, which then provides the final response. Because the responses are based on the relevant contextual enterprise data, this solution reduces hallucinations.

Ingestion & Text generation workflows

You can implement this solution using AWS services as shown in the following figure.

RAG implementation using AWS services

The following is a typical RAG-based AI assistant flow.The first series of steps, as numbered in the preceding diagram, augment the user query using a knowledge base:

    User queries are served by Amazon Lex hosted on Amazon CloudFront. The business logic for the AI assistant can run on Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), or AWS Lambda. You can manage the conversation history or state in Amazon DynamoDB, Amazon DocumentDB (with MongoDB compatibility), or Amazon MemoryDB. The user query and the earlier conversation history are sent to the Amazon Bedrock embedding model. User prompts are compared to earlier prompts cached in MemoryDB. If the user prompt matches a stored prompt, the stored prompt result is retrieved and sent to the user. If there’s no match for the user prompt in the cache, the vector database (such as Amazon OpenSearch, Amazon Aurora, Amazon Kendra, MemoryDB, DocumentDB, or Amazon Neptune) is searched for the query embedding. The relevant knowledge or Amazon Simple Storage Service (Amazon S3) path to the relevant image or video is retrieved. The relevant knowledge, image, or video is retrieved from the Amazon S3 path. The multimodal large language model (LLM) in Amazon Bedrock uses the query results to create a final response. The new answer is stored in the database cache. The final response is sent to the user.

The workflow to create and maintain the knowledge base is shown in the previous diagram:

    The data is ingested from structured and unstructured sources such as a file system, emails, or multi-media content. The data is ingested into Amazon S3 in batches or in real time using services such as AWS Database Migration Service (AWS DMS), Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon AppFlow, or AWS Glue. The data is processed through AWS Glue, Amazon SageMaker processing, or Amazon EMR. An embeddings LLM from Amazon Bedrock generates the embeddings for the content chunks. The embeddings are stored in the vector database.

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases provides managed RAG, which does the heavy lifting of the core activities in the data ingestion and text generation workflows. Nippon uses Amazon Bedrock Knowledge Bases for implementing the RAG pattern. The following figure shows the fully managed data ingestion workflow provided by Amazon Bedrock Knowledge Bases. You can choose the data source from which data of various formats can be incrementally updated. For content chunking, you can choose from the following strategies: fixed, semantic, or hierarchical. You can also choose the embedding models and vector store.

Data ingestion flow

The following table compares various chunking strategies. You can also author a custom chunking strategy using Lambda.

Chunking strategy Details Advantages
Fixed chunking
    Fixed character count division Recursive character text splitting
Quick and efficient
Semantic chunking (used by Nippon)
    Meaningful and complete chunks based on semantic similarity
Better retrieval quality
Hierarchical chunking
    Organize chunks in a structured manner Nodes with summaries and parent-child relationships
Improved retrieval efficiency and reliability

Nippon uses semantic chunking because the documents have sections with semantic similarity. During the text generation workflow, Amazon Bedrock Knowledge Bases creates embeddings from the user input and performs semantic search in the Amazon Bedrock knowledge base. The retrieved results from the context are augmented with a prompt for the LLM to generate the final response as shown in the following figure.

RAG Flow

Advanced RAG methods adopted by Nippon

In this section, we review the key advanced RAG methods used in the Nippon AI assistant solution.

Advanced parsing and chunking

A complex document with tables, images, and graphs poses challenges in RAG because the default fixed chunking often loses the context. For these scenarios, Nippon uses the following approaches:

After the documents are parsed, Amazon Bedrock Knowledge Bases chunks the data. Amazon Bedrock Knowledge Bases provides fixed chunking, semantic chunking, and hierarchical chunking strategies. You can also add custom chunking using Lambda functions for specific documents.

Query reformulation

If the user queries are complex, the overall accuracy of the RAG solution decreases because it can be difficult to understand all the nuances of the complex question. In such cases, you can split the complex question into multiple sub-questions for improved accuracy. Nippon used an LLM to split the compound query into multiple sub-queries. The search results for each of the sub-queries are returned simultaneously and are then ranked and combined. The LLM uses the combined data to provide the final response, as shown in the following figure.

Query reformulation

Multi-query RAG

In the multi-query RAG approach, you can reformulate the question into different variants (Amazon Bedrock Knowledge Bases provides automatic query reformulation) and execute the queries in parallel. You then summarize the results of those queries and rerank them to get the final response as shown in the following figure.

Multi-query RAG

Nippon uses the initial question and break it into n sub-questions using our choice of LLM (Anthropic’s Claude3 Sonnet on Amazon Bedrock). With this implementation, the quality of our responses has improved, and the query is answered in greater depth. Because this requires additional processing, the response time is slightly longer, which is justified by the quality of the response.

Results reranking

In a typical enterprise scenario where you have a large volume of documents, the ranking of the results might not be correct and accuracy could be reduced. For example, the default results returned from a vector database will be based on vector similarity criteria whose rank might not be accurate or relevant based on the user’s conversation history or the query context. In such scenarios, you can use another FM to look closely at the results and reorder them based on analysis of the context, relevance, and other criteria. After the results are reranked, you can be certain that the top results are highly relevant to the user context. Each document or search result is assigned a score based on criteria such as semantic similarity, contextual alignment, or domain-specific features, thereby increasing the overall accuracy of the final response, reducing hallucination, and improving user satisfaction. The key reranker strategies are:

Amazon Bedrock provides reranker models that calculate the relevance of the chunks for a given query, assigns a score for the chunk, and reorders the results based on the score. By using a reranker model in Amazon Bedrock, you can get highly relevant chunks in the search results, improving the overall response accuracy. Nippon has combined query reformulation with results reranking to improve the overall accuracy of the RAG solution using the Amazon Bedrock reranker models.

Metrics for RAG Evaluation

Because RAG solutions have multiple moving parts, you need to evaluate them against key success metrics to make sure that you get relevant and contextual data from the knowledge base. RAG evaluation should validate the generation of complete, correct, and grounded answers without hallucinations. You also need to evaluate the bias, safety, and trust of RAG solutions.

Amazon Bedrock Knowledge Bases provides built-in support for RAG evaluation, including quality metrics such as correctness, completeness, and faithfulness (hallucination detection); responsible AI metrics such as harmfulness, answer refusal, and stereotyping; and compatibility with Amazon Bedrock Guardrails. Nippon used RAG evaluation to compare against multiple evaluation jobs using custom datasets.

Nippon is currently evaluating additional techniques, including GraphRAG, metadata filtering, and agentic AI. We have briefly summarized the capabilities being evaluated at Nippon in the following sections.

GraphRAG

For applications that need to establish relationships with data that are hierarchically related (such as knowledge management, enterprise search, recommendation systems, and so on), Nippon uses a graph database instead of a vector database. GraphRAG applications use rich and interconnected entity relationships to identify the dependencies and excel in querying multi-dimensional relationships that further boost the context for FMs. In addition, with graph databases, Nippon can efficiently query and traverse the data, and their schema enables them to accommodate dynamic content. The graph databases help the FMs better understand the semantic relationship, uncover hidden patterns, adapt to dynamic knowledge bases, and provide better reasonability.

Amazon Bedrock Knowledge Bases supports fully managed GraphRAG, which uses the Amazon Neptune graph database.

Metadata filtering

In a few use cases, you might need to filter the documents in a knowledge base based on specific metadata values. For instance, government regulatory bodies release regulatory guidelines frequently, often with the same document names and only minor variations in the regulatory clauses. When your query is about a specific regulation, you want the most recent document to be returned. In such cases, you can rank the documents based on their modified date. Amazon Bedrock Knowledge Bases provide custom metadata as filters (such as modified date) to improve the quality of search results.

Amazon Bedrock Agents

With Amazon Bedrock Agents, you can orchestrate multi-step business processes using tools and information sources. Nippon is currently evaluating the latest FMs in Amazon Bedrock Agents for their AI assistant use case.

Solution flow

The following diagram shows the end-to-end flow of the Nippon AI assistant solution.

End-to-end flow of the Nippon AI assistant solution

The data ingestion workflow consists of the following steps:

    The documents from the data source are prepared for ingestion. Nippon uses custom parsing to extract the relevant details from complex elements like tables, graphs, and images. They use custom chunking to create the chunks. They use an embedding model from Amazon Bedrock to convert the content chunks to vector embeddings and store them in the vector database.

The content generation workflow is as follows:

    The user’s query is converted into embeddings by the Amazon Bedrock embedding model. Nippon uses multi-query RAG by creating multiple variants of the query and executing them in parallel. The obtained results are reranked using a reranker model for higher accuracy. The prompt is augmented with the references from the source documents to create citations The augmented prompt is sent to Amazon Bedrock. Amazon Bedrock creates and sends the final response to the user.

Nippon plans to use agentic AI implementation in the future for automating the data retrieval, indexing, and ingestion.

Results

Nippon saw the following improvements after implementing RAG:

Summary

In this post, we discussed RAG and some of the challenges associated with processing large volumes of documents. We explained the advanced RAG methods used in the Nippon AI assistant, including enhanced parsing using Amazon Bedrock Knowledge Bases and third-party models. In addition, we explained query reformulation and multi-query RAG techniques—such as generating multiple queries, reranking results, using GraphRAG, and applying metadata filtering. Finally, we described the end-to-end solution implemented for the Nippon AI assistant. These methods are generally available and are not built by or belong only to Nippon.

Explore Amazon Bedrock Knowledge Bases for RAG use cases by using advanced RAG features such as FM as a parser, query reformulation, reranker models, GraphRAG, and others to implement highly accurate RAG solutions. You can also use Amazon Bedrock Guardrails to build responsible AI solutions by enforcing content and image safeguards and enabling automated reasoning checks.

Please note that RAG methods mentioned in the blog are generally available to all and are not built by or belong only to Nippon. 

About the authors

Shailesh Shivakumar is an FSI Sr. Solutions Architect with AWS India. He works with financial enterprises such as banks, NBFCs, and trading enterprises to help them design secure cloud systems and accelerate their cloud journey. He builds demos and proofs of concept to demonstrate the art of the possible on the AWS Cloud. He leads other initiatives such as customer enablement workshops, AWS demos, cost optimization, and solution assessments to make sure AWS customers succeed in their cloud journey. Shailesh is part of Machine Learning TFC at AWS, handling generative AI and machine learning-focused customer scenarios. Security, serverless, containers, and machine learning in the cloud are his key areas of interest.

Abhinav Pandey is a seasoned Data Scientist on the Technology team at Nippon Life India Asset Management Ltd, with over 18 years of industry experience, primarily in BFSI, who is passionate about using generative AI and agentic AI to transform business operations. With a proven track record of applying data for strategic decision-making and business growth, he excels at extracting actionable insights from complex datasets using cutting-edge analytical techniques. A strategic thinker and innovative problem solver, he has developed data-driven strategies that enhance operational efficiency and profitability while working effectively with cross-functional teams to align data initiatives with business objectives. In his leadership role, he has driven data science initiatives, fostered innovation, maintained a results-driven approach, and continuously adapted to evolving technologies to stay ahead of industry trends.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RAG Amazon Bedrock AI助手 企业AI Nippon Life
相关文章