热点
"基准平台" 相关文章
BALSAM: A Platform for Benchmarking Arabic Large Language Models
cs.AI updates on arXiv.org 2025-07-31T04:48:13.000000Z
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
cs.AI updates on arXiv.org 2025-07-22T04:34:13.000000Z
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
cs.AI updates on arXiv.org 2025-07-08T04:33:41.000000Z