MarkTechPost@AI 2024年07月18日
G-Retriever: Advancing Real-World Graph Question Answering with RAG and LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

G-Retriever 是一种新颖的架构,旨在解决现实世界图谱问答 (GraphQA) 问题,它结合了 GNN、LLM 和 RAG 的优势。该框架通过冻结 LLM 并对 GNN 的输出进行软提示,实现了高效的微调,同时保留了 LLM 的预训练语言能力。G-Retriever 的 RAG 架构通过直接检索图谱信息来减少幻觉,使其能够扩展到超出 LLM 上下文窗口大小的图谱。

🤔 **G-Retriever 的架构**:该架构包含四个主要步骤:索引、检索、子图构建和生成。索引步骤使用预训练语言模型生成节点和图谱嵌入,并将其存储在最近邻数据结构中。检索步骤使用 k 个最近邻来识别给定查询最相关的节点和边。子图构建采用奖赏收集斯坦纳树算法来创建一个可管理的、相关的子图。生成步骤包括一个图谱注意力网络来对子图进行编码,一个投影层来将图谱标记与 LLM 的向量空间对齐,以及一个文本嵌入器将子图转换为文本格式。最后,LLM 使用图谱提示微调生成答案,将图谱标记和文本嵌入器输出结合在一起。

📊 **G-Retriever 的优势**:G-Retriever 在各种配置下的三个数据集上表现出优异的性能,在仅推理设置中优于基线,并且在提示微调和 LoRA 微调中显示出显著的改进。该方法通过减少标记和节点数量来大幅提高效率,从而缩短训练时间。与基线相比,它有效地将幻觉减少了 54%。消融研究揭示了所有组件的重要性,特别是图谱编码器和文本化图谱。G-Retriever 证明了其对不同图谱编码器的鲁棒性,并且受益于更大的 LLM 规模,展示了其在图谱问答任务中的有效性。

🚀 **G-Retriever 的应用**:这项工作为现实世界图谱问答引入了一个新的 GraphQA 基准,并提出了 G-Retriever,这是一种针对复杂图谱查询而设计的架构。与以前专注于传统图谱任务或简单查询的方法不同,G-Retriever 针对跨多个应用程序的现实世界文本图谱。该方法为通用文本图谱实施 RAG 方法,使用软提示来增强图谱理解。G-Retriever 采用奖赏收集斯坦纳树优化来对图谱进行 RAG,从而能够抵抗幻觉并处理大规模图谱。实验结果表明,G-Retriever 在各种文本图谱任务中优于基线,能够有效地扩展到更大的图谱,并且显著减少了幻觉。

Large Language Models (LLMs) have made significant strides in artificial intelligence, but their ability to process complex structured data, particularly graphs, remains challenging. In our interconnected world, a substantial portion of real-world data inherently possesses a graph structure, including the Web, e-commerce systems, and knowledge graphs. Many of these involve textual graphs, making them suitable for LLM-centric methods. While efforts have been made to combine graph-based technologies like Graph Neural Networks (GNNs) with LLMs, they primarily focus on conventional graph tasks or simple questions on small graphs. The research aims to develop a flexible question-answering framework for complex, real-world graphs, enabling users to interact with their graph data through a unified conversational interface.

Researchers have made significant strides in combining graph-based techniques with LLMs. These efforts span various areas, including general graph models, multi-modal architectures, and practical applications such as fundamental graph reasoning, node classification, and graph classification/regression. Retrieval-augmented generation (RAG) has emerged as a promising approach to mitigate hallucination in LLMs and enhance trustworthiness. While successful in language tasks, RAG’s application to general graph tasks still needs to be explored, with most existing work focusing on knowledge graphs. Parameter-efficient fine-tuning (PEFT) techniques have also played a crucial role in refining LLMs, leading to the development of sophisticated multimodal models. However, the application of these advanced techniques to graph-specific LLMs is still in its early stages.

Researchers from the National University of Singapore, University of Notre Dame, Loyola Marymount University, New York University, and Meta AI propose G-Retriever, an innovative architecture designed for GraphQA, integrating the strengths of GNNs, LLMs, and RAG. This framework enables efficient fine-tuning while preserving the LLM’s pre-trained language capabilities by freezing the LLM and using a soft prompting approach on the GNN’s output. G-Retriever’s RAG-based design mitigates hallucinations through direct retrieval of graph information, allowing it to scale to graphs exceeding the LLM’s context window size. The architecture adapts RAG to graphs by formulating subgraph retrieval as a Prize-Collecting Steiner Tree (PCST) optimization problem, enhancing explainability by returning the retrieved subgraph.

G-Retriever’s architecture comprises four main steps: indexing, retrieval, subgraph construction, and generation. In the indexing step, node and graph embeddings are generated using a pre-trained language model and stored in a nearest-neighbor data structure. The retrieval step uses k-nearest neighbors to identify the most relevant nodes and edges for a given query. Subgraph construction employs the Prize-Collecting Steiner Tree algorithm to create a manageable, relevant subgraph. The generation step involves a Graph Attention Network for encoding the subgraph, a projection layer to align the graph token with the LLM’s vector space, and a text embedder to transform the subgraph into a textual format. Finally, the LLM generates an answer using graph prompt tuning, combining the graph token and text embedder output.

G-Retriever demonstrates superior performance across three datasets in various configurations, outperforming baselines in inference-only settings and showing significant improvements with prompt tuning and LoRA fine-tuning. The method greatly enhances efficiency by reducing token and node counts, leading to faster training times. It effectively mitigates hallucination by 54% compared to baselines. An ablation study reveals the importance of all components, particularly the graph encoder and textualized graph. G-Retriever proves robust to different graph encoders and benefits from larger LLM scales, showcasing its effectiveness in graph question-answering tasks.

This work introduces a new GraphQA benchmark for real-world graph question answering and presents G-Retriever, an architecture designed for complex graph queries. Unlike previous approaches focusing on conventional graph tasks or simple queries, G-Retriever targets real-world textual graphs across multiple applications. The method implements a RAG approach for general textual graphs, using soft prompting for enhanced graph understanding. G-Retriever employs Prize-Collecting Steiner Tree optimization to perform RAG over graphs, enabling resistance to hallucination and handling of large-scale graphs. Experimental results demonstrate G-Retriever’s superior performance over baselines in various textual graph tasks, effective scaling with larger graphs, and significant reduction in hallucination.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post G-Retriever: Advancing Real-World Graph Question Answering with RAG and LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

G-Retriever 图谱问答 RAG LLM GNN
相关文章