cs.AI updates on arXiv.org 前天 12:33
Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文针对形式推理与自动定理证明领域的基准测试和评估实践进行批判性分析,提出开放代码、数据和完备无错误的基准将加速该领域发展,并探讨如何消除阻碍贡献的实践及可能产生误导性信息的评估方式。

arXiv:2507.04719v1 Announce Type: new Abstract: This position paper provides a critical but constructive discussion of current practices in benchmarking and evaluative practices in the field of formal reasoning and automated theorem proving. We take the position that open code, open data, and benchmarks that are complete and error-free will accelerate progress in this field. We identify practices that create barriers to contributing to this field and suggest ways to remove them. We also discuss some of the practices that might produce misleading evaluative information. We aim to create discussions that bring together people from various groups contributing to automated theorem proving, autoformalization, and informal reasoning.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

形式推理 自动定理证明 基准测试 开放数据 评估实践
相关文章