热点
关于我们
xx
xx
"
基准平台
" 相关文章
BALSAM: A Platform for Benchmarking Arabic Large Language Models
cs.AI updates on arXiv.org
2025-07-31T04:48:13.000000Z
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
cs.AI updates on arXiv.org
2025-07-22T04:34:13.000000Z
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
cs.AI updates on arXiv.org
2025-07-08T04:33:41.000000Z