热点
"基准套件" 相关文章
REALM-Bench: A Benchmark for Evaluating Multi-Agent Systems on Real-world, Dynamic Planning and Scheduling Tasks
cs.AI updates on arXiv.org 2025-08-06T04:02:15.000000Z
SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy
cs.AI updates on arXiv.org 2025-08-05T11:29:09.000000Z
AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research
cs.AI updates on arXiv.org 2025-07-14T04:08:25.000000Z