热点
关于我们
xx
xx
"
基准构建
" 相关文章
Dynamic Benchmark Construction for Evaluating Large Language Models on Real-World Codes
cs.AI updates on arXiv.org
2025-08-12T04:39:33.000000Z
Rethinking Domain-Specific LLM Benchmark Construction: A Comprehensiveness-Compactness Approach
cs.AI updates on arXiv.org
2025-08-12T04:02:07.000000Z
From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline
无
2024-10-02T06:00:21.000000Z