热点
"评估协议" 相关文章
Unifying Post-hoc Explanations of Knowledge Graph Completions
cs.AI updates on arXiv.org 2025-08-01T04:08:36.000000Z
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1
cs.AI updates on arXiv.org 2025-07-29T04:21:30.000000Z
Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models
cs.AI updates on arXiv.org 2025-07-23T04:03:12.000000Z
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
cs.AI updates on arXiv.org 2025-07-22T04:34:20.000000Z
PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors
cs.AI updates on arXiv.org 2025-07-22T04:34:13.000000Z