热点
关于我们
xx
xx
"
MMLU-Pro
" 相关文章
Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks
cs.AI updates on arXiv.org
2025-07-24T05:31:26.000000Z
大模型权威测试被曝翻车!更偏袒GPT-4等闭源模型,连提示词都区别对待
智源社区
2024-07-12T07:35:55.000000Z
MMLU-Pro: An Enhanced Benchmark Designed to Evaluate Language Understanding Models Across Broader and More Challenging Tasks
MarkTechPost@AI
2024-06-06T07:01:04.000000Z