AWS Machine Learning Blog 20小时前
Tailoring foundation models for your business needs: A comprehensive guide to RAG, fine-tuning, and hybrid approaches
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了如何针对特定业务需求定制Foundation Models (FMs),重点介绍了三种方法:检索增强生成 (RAG)、微调以及结合两者的混合方法。RAG通过连接外部知识源增强AI能力,微调则使用特定领域的数据调整模型参数。混合方法结合了两者的优势,在保证模型与公司整体基调一致的同时,利用RAG获取最新信息。文章还介绍了如何使用AWS服务(如Amazon Bedrock和SageMaker)来实现这些方法,并提供了代码示例和评估指标,帮助用户选择最适合其业务需求的方案。

🔑RAG(检索增强生成)是一种经济高效的AI增强方法,通过连接外部知识源来改进模型响应。它包含检索和生成两个步骤,前者从连接的数据源中查找相关信息,后者使用FM基于检索到的信息生成答案。

⚙️微调是使用额外训练数据为特定任务或领域定制FM的强大方法。通过使用与目标领域相关的小型标记数据集调整模型参数,使其能够提供更准确、更符合上下文的响应。但与RAG相比,微调需要更多的计算资源和时间。

✨混合方法结合了RAG和微调的优点,以提供高度准确、符合上下文的响应。通过定期微调模型以保持与公司整体基调一致,同时使用RAG从知识库中检索最新信息,可以降低成本并提高效率。此外,微调还可以增强RAG在生成阶段的性能。

Foundation models (FMs) have revolutionised AI capabilities, but adopting them for specific business needs can be challenging. Organizations often struggle with balancing model performance, cost-efficiency, and the need for domain-specific knowledge. This blog post explores three powerful techniques for tailoring FMs to your unique requirements: Retrieval Augmented Generation (RAG), fine-tuning, and a hybrid approach combining both methods. We dive into the advantages, limitations, and ideal use cases for each strategy.

AWS provides a suite of services and features to simplify the implementation of these techniques. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Knowledge Bases provides native support for RAG, streamlining the process of enhancing model outputs with domain-specific information. Amazon Bedrock also offers native features for model customizations through continued pre-training and fine-tuning. In addition, you can use Amazon Bedrock Custom Model Import to bring and use your customized models alongside existing FMs through a single serverless, unified API. Use Amazon Bedrock Model Distillation to use smaller, faster, more cost-effective models that deliver use-case specific accuracy that is comparable to the most advanced models in Amazon Bedrock.

For this post, we have used Amazon SageMaker AI for the fine-tuning and hybrid approach to maintain more control over the fine-tuning script and try different fine-tuning methods. In addition, we have used Amazon Bedrock Knowledge Bases for the RAG approach as shown in Figure 1.

To help you make informed decisions, we provide ready-to-use code in our Github repo, using these AWS services to experiment with RAG, fine-tuning, and hybrid approaches. You can evaluate their performance based on your specific use case and your dataset, and use the model that best fits to effectively customize FMs for your business needs.

Figure 1: Architecture diagram for RAG, fine-tuning and hybrid approaches

Retrieval Augmented Generation

RAG is a cost-effective way to enhance AI capabilities by connecting existing models to external knowledge sources. For example, an AI powered customer service chatbot using RAG can answer questions about current product features by first checking the product documentation knowledge base. If a customer asks a question, the system retrieves the specific details from the product knowledge base before composing its response, helping to make sure that the information is accurate and up-to-date.

A RAG approach gives AI models access to external knowledge sources for better responses and has two main steps: retrieval for finding the relevant information from connected data sources and generation using an FM to generate an answer based on the retrieved information.

Fine-tuning

Fine-tuning is a powerful way to customize FMs for specific tasks or domains using additional training data. In fine-tuning, you adjust the model’s parameters using a smaller, labelled dataset relevant to the target domain.

For example, to build an AI powered customer service chatbot, you can fine-tune an existing FM using your own dataset to handle questions about a company’s product features. By training the model on historical customer interactions and product specifications, the fine-tuned model learns the context and the company messaging tone to provide more accurate responses.

If the company launches a new product, the model should be fine-tuned again with new data to update its knowledge and maintain relevance. Fine-tuning helps make sure that the model can deliver precise, context-aware responses. However, it requires more computational resources and time compared to RAG, because the model itself needs to be retrained with the new data.

Hybrid approach

The hybrid approach combines the strengths of RAG and fine-tuning to deliver highly accurate, context-aware responses. Let’s consider an example, a company frequently updates the features of its products. They want to customize their FM using internal data, but keeping the model updated with changes in the product catalog is challenging. Because product features change monthly, keeping the model up to date would be costly and time-consuming.

By adopting a hybrid approach, the company can reduce costs and improve efficiency. They can fine-tune the model every couple of months to keep it aligned with the company’s overall tone. Meanwhile, RAG can retrieve the latest product information from the company’s knowledge base, helping to make sure that responses are up-to-date. Fine-tuning the model also enhances RAG’s performance during the generation phase, leading to more coherent and contextually relevant responses. If you want to further improve the retrieval phase, you can customize the embedding model, use a different search algorithm, or explore other retrieval optimization techniques.

The following sections provide the background for dataset creation and implementation of the three different approaches

Prerequisites

To deploy the solution, you need:

Dataset description

For the proof-of-concept, we created two synthetic datasets using Anthropic’s Claude 3 Sonnet on Amazon Bedrock.

Product catalog dataset

This dataset is your primary knowledge source in Amazon Bedrock. We created a product catalog which consists of 15 fictitious manufacturing products by prompting Anthropic’s Claude 3 Sonnet using example product catalogs. You should create your dataset in .txt format. The format in the example for this post has the following fields:

Train and test the dataset

We use the same product catalog we created for the RAG approach as training data to run domain adaptation fine-tuning.

The test dataset consists of question-and-answer pairs about the product catalog dataset created earlier. We used this code in the Question-Answer Dataset Jupyter notebook section to generate the test dataset.

Implementation

We implemented three different approaches: RAG, fine-tuning, and hybrid. See the Readme file for instructions to deploy the whole solution.

RAG

The RAG approach uses Amazon Bedrock Knowledge Bases and consists of two main parts.

To set up the infrastructure:

    Update the config file with your required data (details in the Readme) Run the following commands in the infrastructure folder:
cd infrastructure./prepare.shcdk bootstrap aws://<<ACCOUNT_ID>>/<<REGION>>cdk synthcdk deploy --all

Context retrieval and response generation:

    The system finds relevant information by searching the knowledge base with the user’s question It then sends both the user’s question and the retrieved information to Meta LLama 3.1 8b LLM on Amazon Bedrock The LLM will then generate a response based on the user’s question and retrieved information

Fine-tuning

We used Amazon SageMaker AI JumpStart to fine-tune the Meta Llama 3.1 8b Instruct model using domain adaptation method for 5 epochs. You can adjust the following parameters in the config.py file:

Hybrid

The hybrid approach combines RAG and fine-tuning, and uses the following high-level steps:

    Retrieve the most relevant context based on the user’s question from the Knowledge Base The fine-tuned model generates answers using the retrieved context

You can customize the prompt template in the config.py file.

Evaluation

For this example, we use three evaluation metrics to measure performance. You can modify src/evaluation.py to implement your own metrics for your evaluation implementation.

Each metric helps you understand different aspects of how well each of the approaches works:

Figure 2: LLM evaluator method

Based on those assumptions, the cost associated with each of the three approaches is calculated as follows:

Results

You can find detailed evaluation results in two places in the code repository. The individual scores for each sample are in the JSON files under data/output, and a summary of the results is in summary_results.csv in the same folder.

The results shown in the following table show:

Approach Average BERTScore Average LLM evaluator score Average inference time (in seconds) Cost per month (US East region)
RAG 0.8999 0.8200 8.336 ~=350 + 198 ~= 548$
Finetuning 0.8660 0.5556 4.159 ~= 1.77 + 5105 ~= 5107$
Hybrid 0.8908 0.8556 17.700 ~= 350 + 1.77 + 5105 ~= 5457$

Note that the costs for both the fine-tuning and hybrid approach can decrease significantly depending on the traffic pattern if you set the real-time inference endpoint from SageMaker to scaledown to zero instances when not in use. 

Clean up

Follow the cleanup section in the Readme file in order to avoid paying for unused resources.

Conclusion

In this post, we showed you how to implement and evaluate three powerful techniques for tailoring FMs to your business needs: RAG, fine-tuning, and a hybrid approach combining both methods. We provided ready-to-use code to help you experiment with these approaches and make informed decisions based on your specific use case and dataset.

The results in this example were specific to the dataset that we used. For that dataset, RAG outperformed fine-tuning and achieved comparable results to the hybrid approach with a lower cost, but fine-tuning led to the lowest latency. Your results will vary depending on your dataset.

We encourage you to test these approaches using our code as a starting point:

    Add your own datasets in the data folder Fill out the config.py file Follow the rest of the readme instructions to run the full evaluation


About the Authors

Idil Yuksel is a Working Student Solutions Architect at AWS, pursuing her MSc. in Informatics with a focus on machine learning at the Technical University of Munich. She is passionate about exploring application areas of machine learning and natural language processing. Outside of work and studies, she enjoys spending time in nature and practicing yoga.

Karim Akhnoukh is a Senior Solutions Architect at AWS working with customers in the financial services and insurance industries in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Foundation Models RAG Fine-tuning Amazon Bedrock AWS
相关文章