热点
"AI基准评估" 相关文章
Deprecating Benchmarks: Criteria and Framework
cs.AI updates on arXiv.org 2025-07-10T04:05:42.000000Z
Establishing Best Practices for Building Rigorous Agentic Benchmarks
cs.AI updates on arXiv.org 2025-07-04T04:08:25.000000Z