In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding

cs.AI updates on arXiv.org 07月22日 12:44

In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding

本文介绍了ChartScope，一种优化用于图表理解的LVLM，通过数据生成管道和Dual-Path训练策略，显著提高了对多样化图表类型的理解能力。

arXiv:2507.14298v1 Announce Type: cross Abstract: Recent methods for customizing Large Vision Language Models (LVLMs) for domain-specific tasks have shown promising results in scientific chart comprehension. However, existing approaches face two major limitations: First, they rely on paired data from only a few chart types, limiting generalization to wide range of chart types. Secondly, they lack targeted pre-training for chart-data alignment, which hampers the model's understanding of underlying data. In this paper, we introduce ChartScope, an LVLM optimized for in-depth chart comprehension across diverse chart types. We propose an efficient data generation pipeline that synthesizes paired data for a wide range of chart types, along with a novel Dual-Path training strategy that enabling the model to succinctly capture essential data details while preserving robust reasoning capabilities by incorporating reasoning over the underlying data. Lastly, we establish ChartDQA, a new benchmark for evaluating not only question-answering at different levels but also underlying data understanding. Experimental results demonstrate that ChartScope significantly enhances comprehension on a wide range of chart types. The code and data are available at https://davidhalladay.github.io/chartscope_demo.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

图表理解 LVLM 数据生成训练策略

相关文章

Exploring the Frontiers of Artificial Intelligence: A Comprehensive Analysis of Reinforcement Learning, Generative Adversarial Networks, and Ethical Implications in Modern AI Systems

CharXiv: A Comprehensive Evaluation Suite Advancing Multimodal Large Language Models Through Realistic Chart Understanding Benchmarks

解密Prompt系列33. LLM之图表理解任务-多模态篇

ChartGemma: A Multimodal Model Instruction-Tuned on Data Generated Directly from a Diverse Range of Real-World Chart Images

From Diagrams to Solutions: MAVIS’s Three-Stage Framework for Mathematical AI

Dr. Stacy Sims: Female-Specific Exercise & Nutrition for Health, Performance & Longevity

英伟达最新技术分享：手把手教你用 Llama 3.1 合成数据改进模型！附代码

Google AI Introduces ShieldGemma: A Comprehensive Suite of LLM-based Safety Content Moderation Models Built on Gemma2

多模态LLM视觉推理能力堪忧，浙大领衔用GPT-4合成数据构建多模态基准

LLM数学性能暴涨168%，微软14人团队力作！合成数据2.0秘诀曝光，智能体生成教学