MarkTechPost@AI 2024年12月03日
DMQR-RAG: A Diverse Multi-Query Rewriting Framework Designed to Improve the Performance of Both Document Retrieval and Final Responses in RAG
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大型语言模型(LLM)在知识库和信息准确性方面存在挑战,检索增强生成(RAG)技术通过整合外部信息来提升准确性,但用户查询中的噪声和歧义影响了文档检索效果。本文介绍了DMQR-RAG框架,该框架采用四种不同层级的查询重写策略(GQR、KWR、PAR、CCE),并结合自适应策略选择方法,有效提升了文档检索和最终响应质量。实验结果表明,DMQR-RAG在多个数据集上显著优于基线方法,尤其在小型语言模型上表现突出,实现了更高的召回率和准确率,并在实际应用场景中取得了成功。

🤔 **DMQR-RAG框架旨在解决RAG系统中用户查询噪声和歧义导致的文档检索效果不佳问题。**该框架通过引入四种不同层级的查询重写策略(GQR、KWR、PAR、CCE),例如GQR通过去除噪声保留关键信息来优化查询,KWR提取搜索引擎偏好的关键词等,从而提升文档检索的准确性和多样性。

🚀 **DMQR-RAG框架采用自适应策略选择方法,动态选择最优的重写策略。**该方法有效避免了不必要的重写操作,优化了检索性能,实现了在保证检索效果的同时减少计算资源消耗。

📊 **实验结果表明,DMQR-RAG在多个数据集(AmbigNQ、HotpotQA、FreshQA)上表现出色,显著优于基线方法。**例如,在FreshQA数据集上,DMQR-RAG将P@5指标提升了14.46%,并在大多数指标上超越了RAG-Fusion。此外,DMQR-RAG在小型语言模型上也表现出良好的适应性,证明了其在实际应用场景中的有效性。

💡 **DMQR-RAG框架在实际应用中取得了成功,提升了1500万用户查询的检索效果。**通过提高命中率和准确率,增强了响应的正确性和相关性,并且没有牺牲其他性能指标。

🌐 **DMQR-RAG框架的创新点在于其多策略查询重写和自适应策略选择方法,有效提升了RAG系统的检索性能,在实际应用中取得了显著成果。**

The static knowledge base and hallucination-creating inaccuracy or fabrication of information are two common issues with large language models (LLMs). The parametric knowledge within LLMs is inherently static, making it challenging to provide up-to-date information in real-time scenarios. Retrieval-augmented generation (RAG) addresses the problem of integrating external, real-time information to enhance accuracy and relevance. However, noise, ambiguity, and deviation in intent in user queries are often a hindrance to effective document retrieval. Query rewriting plays an important role in refining such inputs to ensure that retrieved documents more closely match the actual intent of the user.

Current methods for query rewriting in RAG systems can be broadly categorized into two- training-based and prompt-based approaches. Training-based methods involve supervised fine-tuning using annotated data or reinforcement learning, while prompt-based methods use prompt engineering to guide LLMs in specific rewriting strategies. While prompt-based methods are cost-effective, they often lack generalization and diversity. Multi-strategy rewriting addresses these issues by combining different prompt-based techniques to handle diverse query types and enhance retrieval diversity. 

To address this, researchers from Renmin University of China, Southeast University, Beijing Jiaotong University, and Kuaishou Technology have proposed DMQR-RAG, a Diverse Multi-Query Rewriting framework. This framework uses four strategies of rewriting at different levels of information to improve performance over baseline approaches. Moreover, an adaptive strategy selection method is proposed to minimize the number of rewrites while optimizing overall performance.

The DMQR-RAG framework introduced four rewriting strategies: GQR, KWR, PAR, and CCE. GQR refines a query by omitting noise and maintaining relevant information to produce the actual query. Similarly, KWR is used to extract keywords that search engines prefer. PAR constructs a pseudo answer to broaden the query with useful information, while CCE concentrates on finding key information where detailed queries were extracted. 

The framework also supports a strategy selection method of adaptive rewriting strategy that dynamically determines the best strategies for any given query, avoiding superfluous rewrites and optimizing retrieval performance. In both academic and industrial environments, through a wide range of experiments, these proposed methods validate that the processes may lead to significant improvement in document retrieval and final response quality. The approach outperforms baseline methods, achieving approximately 10% performance gains and demonstrating particular effectiveness for smaller language models by reducing unnecessary rewrites and minimizing retrieval noise. The proposed method demonstrated superior performance across datasets like AmbigNQ, HotpotQA, and FreshQA, achieving higher recall (H@5) and precision (P@5) compared to baselines. For example, DMQR-RAG improved P@5 by up to 14.46% on FreshQA and surpassed RAG-Fusion in most metrics. It also showed versatility by performing well with smaller LLMs and proving effective in real-world industry scenarios.

In conclusion, DMQR-RAG solved the problem of improving relevance-aggregate retrieval systems, developing a diverse multi-query rewriting framework, and an adaptive strategy selection method. Its improvements include enhanced relevance and diversity of retrieved documents. These innovations lead to better overall performance in retrieval-augmented generation. DMQR-RAG proves effective in real-world scenarios, improving query retrieval across 15 million user queries. It increases hit and precision rates while enhancing response correctness and relevance without sacrificing other performance metrics.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

Evaluation of Large Language Model Vulnerabilities: A Comparative Analysis of Red Teaming Techniques’ Read the Full Report (Promoted)

The post DMQR-RAG: A Diverse Multi-Query Rewriting Framework Designed to Improve the Performance of Both Document Retrieval and Final Responses in RAG appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RAG 查询重写 大型语言模型 文档检索 DMQR-RAG
相关文章