TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task

cs.AI updates on arXiv.org 07月23日 12:03

本文介绍TaxCalcBench，一个用于评估AI计算个人所得税能力的基准测试。研究发现，当前模型在简化样本集上仅能正确计算不到三分之一的联邦所得税申报，存在误用税表、计算错误等问题，指出LLMs在个人所得税计算任务上仍需更多基础设施支持。

arXiv:2507.16126v1 Announce Type: new Abstract: Can AI file your taxes? Not yet. Calculating US personal income taxes is a task that requires building an understanding of vast amounts of English text and using that knowledge to carefully compute results. We propose TaxCalcBench, a benchmark for determining models' abilities to calculate personal income tax returns given all of the necessary information. Our experiment shows that state-of-the-art models succeed in calculating less than a third of federal income tax returns even on this simplified sample set. Our analysis concludes that models consistently misuse tax tables, make errors in tax calculation, and incorrectly determine eligibility. Our findings point to the need for additional infrastructure to apply LLMs to the personal income tax calculation task.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签