MarkTechPost@AI 2024年08月12日
Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

苹果公司研究人员发布了一种名为KGLENS的新方法,旨在评估大型语言模型(LLM)的知识准确性和可靠性。KGLENS使用参数化知识图(PKG)和类似汤普森采样的方法来探测LLM的知识,并识别其知识盲点。KGLENS的特点是使用GPT-4将知识图转换为自然语言问题,并通过问答任务评估LLM的知识。该框架在各种LLM上进行了评估,结果表明GPT-4系列始终优于其他模型。此外,KGLENS还提供了一种可视化工具,可以帮助用户更直观地了解LLM的知识结构。

🍎 **KGLENS:一种新颖的评估框架** KGLENS旨在评估大型语言模型(LLM)的知识准确性和可靠性,并识别其知识盲点。它使用参数化知识图(PKG)和类似汤普森采样的方法,通过问答任务来探测LLM的知识,并根据结果更新PKG。

🧠 **知识图驱动的问答** KGLENS使用GPT-4将知识图转换为自然语言问题,并设计了两种类型的问答:事实核查问答和事实生成问答,以减少答案歧义。该框架生成的97.7%的问题被人类评估者认为是合理的。

📊 **评估结果** KGLENS在各种LLM上进行了评估,结果表明GPT-4系列始终优于其他模型,包括GPT-4、GPT-4o和GPT-4-turbo。GPT-4o在处理个人信息方面更加谨慎。GPT-3.5-turbo的表现与GPT-4之间存在显著差距,有时甚至比传统的LLM表现更差。传统模型(如Babbage-002和Davinci-002)的表现仅略微优于随机猜测,这表明最近的LLM取得了进展。

📈 **KGLENS的优势** KGLENS通过利用PKG和汤普森采样方法,提供了一种高效且准确的评估LLM知识的方法。它克服了现有方法的局限性,例如数据泄漏、静态内容和有限的指标。此外,KGLENS还提供了一种可视化工具,可以帮助用户更直观地了解LLM的知识结构。

🤝 **未来展望** KGLENS及其对知识图的评估将提供给研究社区,促进协作。对于企业而言,该工具可以帮助开发更可靠的AI系统,提升用户体验并改进模型知识。KGLENS代表着创建更准确和可靠的AI应用程序的重大进步。

Large Language Models (LLMs) have gained significant attention for their versatility, but their factualness remains a critical concern. Studies have revealed that LLMs can produce nonfactual, hallucinated, or outdated information, undermining reliability. Current evaluation methods, such as fact-checking and fact-QA, face several challenges. Fact-checking struggles to assess the factualness of generated content, while fact-QA encounters difficulties scaling up evaluation data due to expensive annotation processes. Both approaches also face the risk of data contamination from web-crawled pretraining corpora. Also, LLMs often respond inconsistently to the same fact when presented in different forms, a challenge is that existing evaluation datasets need to be equipped to address.

Existing attempts to evaluate LLMs’ knowledge primarily use specific datasets, but face challenges like data leakage, static content, and limited metrics. Knowledge graphs (KGs) offer advantages in customization, evolving knowledge, and reduced test set leakage. Methods like LAMA and LPAQA use KGs for evaluation but struggle with unnatural question formats and impracticality for large KGs. KaRR overcomes some issues but remains inefficient for large graphs and lacks generalizability. Current approaches focus on accuracy over reliability, failing to address LLMs’ inconsistent responses to the same fact. Also, no existing work visualizes LLMs’ knowledge using KGs, presenting an opportunity for improvement. These limitations highlight the need for more comprehensive and efficient methods to evaluate and understand LLMs’ knowledge retention and accuracy.

Researchers from Apple introduced KGLENS, an innovative knowledge probing framework that has been developed to measure knowledge alignment between KGs and LLMs and identify LLMs’ knowledge blind spots. The framework employs a Thompson sampling-inspired method with a parameterized knowledge graph (PKG) to probe LLMs efficiently. KGLENS features a graph-guided question generator that converts KGs into natural language using GPT-4, designing two types of questions (fact-checking and fact-QA) to reduce answer ambiguity. Human evaluation shows that 97.7% of generated questions are sensible to annotators.

KGLENS employs a unique approach to efficiently probe LLMs’ knowledge using a PKG and Thompson sampling-inspired method. The framework initializes a PKG where each edge is augmented with a beta distribution, indicating the LLM’s potential deficiency on that edge. It then samples edges based on their probability, generates questions from these edges, and examines the LLM through a question-answering task. The PKG is updated based on the results, and this process iterates until convergence. Also, This framework features a graph-guided question generator that converts KG edges into natural language questions using GPT-4. It creates two types of questions: Yes/No questions for judgment and Wh-questions for generation, with the question type controlled by the graph structure. Entity aliases are included to reduce ambiguity.

For answer verification, KGLENS instructs LLMs to generate specific response formats and employs GPT-4 to check the correctness of responses for Wh-questions. The framework’s efficiency is evaluated through various sampling methods, demonstrating its effectiveness in identifying LLMs’ knowledge blind spots across diverse topics and relationships.

KGLENS evaluation across various LLMs reveals that the GPT-4 family consistently outperforms other models. GPT-4, GPT-4o, and GPT-4-turbo show comparable performance, with GPT-4o being more cautious with personal information. A significant gap exists between GPT-3.5-turbo and GPT-4, with GPT-3.5-turbo sometimes performing worse than legacy LLMs due to its conservative approach. Legacy models like Babbage-002 and Davinci-002 show only slight improvement over random guessing, highlighting the progress in recent LLMs. The evaluation provides insights into different error types and model behaviors, demonstrating the varying capabilities of LLMs in handling diverse knowledge domains and difficulty levels.

KGLENS introduces an efficient method for evaluating factual knowledge in LLMs using a Thompson sampling-inspired approach with parameterized Knowledge Graphs. The framework outperforms existing methods in revealing knowledge blind spots and demonstrates adaptability across various domains. Human evaluation confirms its effectiveness, achieving 95.7% accuracy. KGLENS and its assessment of KGs will be made available to the research community, fostering collaboration. For businesses, this tool facilitates the development of more reliable AI systems, enhancing user experiences and improving model knowledge. KGLENS represents a significant advancement in creating more accurate and dependable AI applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post Apple Researchers Present KGLens: A Novel AI Method Tailored for Visualizing and Evaluating the Factual Knowledge Embedded in LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

KGLENS 大型语言模型 知识评估 知识图 苹果
相关文章