Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

cs.AI updates on arXiv.org 8小时前

Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

本文提出一种基于多智能体投票框架的视觉问答方法，通过模拟团队协作和工具使用，有效提升LLMs在VQA任务上的表现，实验结果表明，该方法在OK-VQA和A-OKVQA数据集上分别比其他基线提升了2.2和1.0。

arXiv:2412.18351v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in teams. Humans tend to know whether they need to use external tools when they encounter a new question, e.g., they tend to be able to give a direct answer to a familiar question, whereas they tend to use tools such as search engines when they encounter an unfamiliar question. In addition, humans also tend to collaborate and discuss with others to get better answers. Inspired by this, we propose the multi-agent voting framework. We design three LLM-based agents that simulate different levels of staff in a team, and assign the available tools according to the levels. Each agent provides the corresponding answer, and finally all the answers provided by the agents are voted to get the final answer. Experiments on OK-VQA and A-OKVQA show that our approach outperforms other baselines by 2.2 and 1.0, respectively.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉问答多智能体 LLMs 投票框架性能提升

相关文章

Robust Visual Reasoning with Adriana Kovashka - #463

How do Language Agents Perform in Translating Long-Text Novels? Meet TransAgents: A Multi-Agent Framework Using LLMs to Tackle the Complexities of Literary Translation

FinRobot: A Novel Open-Source AI Agent Platform Supporting Multiple Financially Specialized AI Agents Powered by LLMs

Show HN: 让开发人员方便使用 LLM 的 CLI

如何优化 LLM 以提高准确性

Show HN: 开源 LLM 补丁流 - 速度和输出令牌改进

Show HN: Chatty - 用于在浏览器中运行 LLM 的免费人工智能私人聊天工具

Rivian 更新 R1，采用新型电机和电池组，提高了性能，降低了成本

Solana: ↩️ @vohvohh

法学硕士在引用资料来源时几乎都是正确的，对此最好的解释是什么？