MarkTechPost@AI 01月22日
SlideGar: A Novel AI Approach to Use LLMs in Retrieval Reranking, Solving the Challenge of Bound Recall
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了文档搜索系统中的‘retrieve and rank’方法,指出其存在受限召回问题。为解决此问题,提出自适应检索(AR)。德国和格拉斯哥大学研究者提出SlideGar算法,将AR与LLM相结合,该算法在多种实验中表现出色,提高了nDGC@10得分和召回率,增加的延迟可忽略不计。

📄‘retrieve and rank’方法及存在问题

💡提出自适应检索(AR)解决受限召回

🎯SlideGar算法将AR与LLM相结合

📈SlideGar算法在实验中表现优异

Out of the various methods employed in document search systems, “retrieve and rank” has gained quite some popularity. Using this method, the results of a retrieval model are re-ordered according to a re-ranker. Additionally, in the wake of advancements in generative AI and the development of Large Language Models (LLMs), rankers are now capable of performing listwise reranking tasks after analyzing complex patterns in language. However, a crucial problem exists that appears trivial but limits the overall effectiveness of these cascading systems.

The challenge of the bounded recall problem, where a document is irrevocably excluded from the final ranked list if it wasn’t retrieved in the initial phase, causes the loss of high-potential information. To solve this problem, researchers came up with an adaptive retrieval process. Adaptive Retrieval (AR) differentiates itself from previous works by leveraging the ranker’s assessments to expand the retrieval set dynamically. A clustering hypothesis is applied in this process to group similar documents that may be relevant to a query. Adaptive Retrieval (AR) could be better understood as a pseudo-relevance feedback mechanism that enhances the likelihood of including pertinent documents that may have been omitted during the initial retrieval.

Although AR serves as a robust solution in cascading systems, contemporary work in this vertical operates under the assumption that the relevance score depends only on the document and query, implying that one document’s score is computed independently of others. On the other hand, LLM-based ranking methods use signals from the entire ranked list to determine relevance. This article discusses the latest research that merges the benefits of LLMs with AR.

Researchers from the L3S Research Center, Germany, and the University of Glasgow have put forth SlideGar: Sliding Window-based Adaptive Retrieval to integrate AR with LLMs while accounting for the fundamental differences between their pointwise and listwise approaches. SlideGar modifies AR such that the resulting ranking function outputs a ranked order of documents rather than discrete relevance scores. The proposed algorithm merges results from the initial ranking with feedback documents provided by the most relevant documents identified up to that point.

The SlideGar algorithm utilizes AR methods like graph-based adaptive retrieval (Gar) and query affinity modeling-based adaptive retrieval (Quam) to find document neighbors in a constant amount of time. For LLM ranking, the authors employ a sliding window to overcome the constraint of input context. SlideGar processes the initial pool of documents given by the retriever for a specific query and, for a predefined length and step size, ranks the top w documents from left to right using a listwise ranker. These documents are then removed from the pool. The authors used the reciprocal of the rank as a pseudo-score for the documents.

The authors employed the MSMARCO corpus data for practical purposes and evaluated its performance on REC Deep Learning 2019 and 2020 query sets. They also used the latest versions of these datasets and de-duplicated them to remove redundancies. A variety of sparse and dense retrievers were utilized. For rankers, the authors employed different listwise rankers, including both zero-shot and fine-tuned models. The authors leveraged the open-source Python library, ReRankers, to apply these listwise re-rankers.

After conducting an extensive set of experiments across diverse LLM re-rankers, first-stage retrievers, and feedback documents, the authors ascertained that SlideGar improved the nDGC@10 score by up to 13% and recall by 28%, with a constant number of LLM inferences over the SOTA listwise rankers. Furthermore, regarding computation, the authors discovered that the proposed method adds negligible latency (a mere 0.02%).

Conclusion: In this research paper, the authors propose a new algorithm, SlideGar, that allows LLM re-rankers to address the challenge of bounded recall in retrieval. SlideGar merges the functionalities of AR and LLM re-rankers to complement each other. This work paves the way for researchers to further explore and adapt LLMs for ranking purposes.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

[Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

The post SlideGar: A Novel AI Approach to Use LLMs in Retrieval Reranking, Solving the Challenge of Bound Recall appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SlideGar 受限召回 自适应检索 LLM
相关文章