Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering

cs.AI updates on arXiv.org 13小时前

Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering

本文提出一种基于合成数据的VLM图表理解方法，通过代码生成和执行生成图表-问题-答案三元组，并设计候选条件回答过程，显著提升VLM图表理解准确率。

arXiv:2508.11975v1 Announce Type: new Abstract: Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly in accurate chart description and complex reasoning. Synthetic data generation is a promising solution, while usually facing the challenge of noise labels. To address this challenge, we first introduce a chart synthesis pipeline that generates aligned chart-question-answer triplets through code generation and execution, ensuring the reliability of synthetic data without human intervention. Furthermore, inspired by test-time scaling that increases inference budget and thereby improves performance, we design a candidate-conditioned answering process. The VLM first generates multiple responses per query, and then synthesizes the final answer by contextualizing these candidates. Experiments demonstrate significant improvements, with up to 15.50 points accuracy gain over the initial VLM, in a fully self-improving paradigm without either human-labeled data or external models.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Vision Language Models 图表理解合成数据候选条件回答准确率提升

相关文章

Import AI 369: Conscious machines are possible; AI agents; the varied uses of synthetic data

Synthetic Data Generation for Robotics with Bill Vass - #588

和@歸藏一起视频会议看完 OpenAI 的发布，讨论了一会，背脊发凉… 1️⃣ 没想到卷推理卷到了这种程度? 现实交流场景下300ms 左右的体验奇点真没想到就这样被...

Using generative AI to improve software testing

Google AI Described New Machine Learning Methods for Generating Differentially Private Synthetic Data

读者问我为啥【筱思萌想】断更了，小竹林也更的如星星之火般少，那当然是因为我这个半吊子作者和小伙伴们去做了个公司???。诺，CEO是这个家伙@kevin_大...

Synthetic Data Generation in Foundation Models and Differential Privacy: Three Papers from Microsoft Research

研究表明，像 ChatGPT 这样的人工智能系统可能很快就会耗尽数据资源

Scaling AI Models: Combating Collapse with Reinforced Synthetic Data

NVIDIA AI Introduces Nemotron-4 340B: A Family of Open Models that Developers can Use to Generate Synthetic Data for Training Large Language Models (LLMs)