Rationale-guided Prompting for Knowledge-based Visual Question Answering

cs.AI updates on arXiv.org 07月31日 12:48

Rationale-guided Prompting for Knowledge-based Visual Question Answering

本文提出PLRH框架，通过启发式推理启发LLMs生成中间思维过程，以提升知识型视觉问答的性能，实验结果表明该方法在OK-VQA和A-OKVQA上均优于现有基线。

arXiv:2412.16936v2 Announce Type: replace-cross Abstract: Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs 知识型视觉问答 PLRH框架中间思维过程性能提升

相关文章

FinRobot: A Novel Open-Source AI Agent Platform Supporting Multiple Financially Specialized AI Agents Powered by LLMs

Show HN: 让开发人员方便使用 LLM 的 CLI

如何优化 LLM 以提高准确性

Show HN: 开源 LLM 补丁流 - 速度和输出令牌改进

Show HN: Chatty - 用于在浏览器中运行 LLM 的免费人工智能私人聊天工具

Rivian 更新 R1，采用新型电机和电池组，提高了性能，降低了成本

Solana: ↩️ @vohvohh

法学硕士在引用资料来源时几乎都是正确的，对此最好的解释是什么？

Intel：正式发布第二代酷睿Ultra处理器架构

重要科學運算函式庫NumPy經多年開發迎來2.0重大更新