Chain of Questions: Guiding Multimodal Curiosity in Language Models

cs.AI updates on arXiv.org 5小时前

Chain of Questions: Guiding Multimodal Curiosity in Language Models

本文提出了一种名为Chain of Questions（CoQ）的框架，通过好奇心驱动的推理方法，促进多模态语言模型在复杂环境中动态生成问题，引导模型激活相关感官模态，提高推理准确性和任务适应性。

arXiv:2508.04350v1 Announce Type: cross Abstract: Reasoning capabilities in large language models (LLMs) have substantially advanced through methods such as chain-of-thought and explicit step-by-step explanations. However, these improvements have not yet fully transitioned to multimodal contexts, where models must proactively decide which sensory modalities such as vision, audio, or spatial perception to engage when interacting with complex real-world environments. In this paper, we introduce the Chain of Questions (CoQ) framework, a curiosity-driven reasoning approach that encourages multimodal language models to dynamically generate targeted questions regarding their surroundings. These generated questions guide the model to selectively activate relevant modalities, thereby gathering critical information necessary for accurate reasoning and response generation. We evaluate our framework on a novel multimodal benchmark dataset, assembled by integrating WebGPT, ScienceQA, AVSD, and ScanQA datasets. Experimental results demonstrate that our CoQ method improves a foundation model's ability to effectively identify and integrate pertinent sensory information. This leads to improved accuracy, interpretability, and alignment of the reasoning process with diverse multimodal tasks.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多模态语言模型推理能力 CoQ框架感官模态推理过程

相关文章

A Survey Report on New Strategies to Mitigate Hallucination in Multimodal Large Language Models

AI代理人的機會與考驗

GNN-RAG: A Novel AI Method for Combining Language Understanding Abilities of LLMs with the Reasoning Abilities of GNNs in a Retrieval-Augmented Generation (RAG) Style

不仅仅是规模

爱丽丝梦游仙境显示法律硕士完全推理崩溃的简单任务

Sketchpad: An AI Framework that Gives Multimodal Language Models LMs a Visual Sketchpad and Tools to Draw on the Sketchpad

AI 高考及格率只有 33%！掀开 AI 的遮羞布一探究竟！

Anthropic: ↩️ Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding p...

微软研究院MRP：大模型动态选择最佳解题策略的元推理提示，比CoT、ToT更有效

InternLM2.5-7B-Chat: Open Sourcing Large Language Models with Unmatched Reasoning, Long-Context Handling, and Enhanced Tool Use