热点
"基准构建" 相关文章
Dynamic Benchmark Construction for Evaluating Large Language Models on Real-World Codes
cs.AI updates on arXiv.org 2025-08-12T04:39:33.000000Z
Rethinking Domain-Specific LLM Benchmark Construction: A Comprehensiveness-Compactness Approach
cs.AI updates on arXiv.org 2025-08-12T04:02:07.000000Z
From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline
2024-10-02T06:00:21.000000Z