cs.AI updates on arXiv.org 前天 12:08
CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种新的基准EGE-Math Solutions Assessment Benchmark,用于评估视觉语言模型在评估手写数学解决方案方面的能力。该基准侧重于理解学生解决方案、识别错误并根据固定标准评分。研究结果表明,当前VLMs在数学推理和人类评分标准对齐方面存在局限性。

arXiv:2507.22958v1 Announce Type: cross Abstract: This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉语言模型 数学解题评估 EGE-Math基准
相关文章