热点
"基准数据集" 相关文章
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
cs.AI updates on arXiv.org 2025-08-01T04:08:26.000000Z
Reading Between the Timelines: RAG for Answering Diachronic Questions
cs.AI updates on arXiv.org 2025-08-01T04:08:21.000000Z
Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
cs.AI updates on arXiv.org 2025-07-29T04:22:36.000000Z
Explainable Synthetic Image Detection through Diffusion Timestep Ensembling
cs.AI updates on arXiv.org 2025-07-29T04:21:52.000000Z
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts
cs.AI updates on arXiv.org 2025-07-28T04:43:04.000000Z
Retrieval-Augmented Clinical Benchmarking for Contextual Model Testing in Kenyan Primary Care: A Methodology Paper
cs.AI updates on arXiv.org 2025-07-22T04:44:37.000000Z
MCFormer: A Multi-Cost-Volume Network and Comprehensive Benchmark for Particle Image Velocimetry
cs.AI updates on arXiv.org 2025-07-08T05:54:08.000000Z
From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection
cs.AI updates on arXiv.org 2025-07-08T04:33:40.000000Z
多模态文档理解新挑战!字节跳动、华中科技大学联合发布WildDoc基准,揭示真实场景下MLLMs的文档理解鲁棒性短板
我爱计算机视觉 2025-05-26T13:07:16.000000Z
Sci. Adv. | 机器学习赋能量子计算,自适应杂化泛函提高预测精度
智源社区 2025-04-16T06:07:52.000000Z
Is Sentiment Analysis in Qualitative Data Analysis Software Accurate?
Blog on Text Analytics - Provalis Research 2024-11-27T08:38:01.000000Z
当视觉大模型陷入认知失调,马里兰大学构建了一个幻觉自动生成框架
36氪 - 科技频道 2024-11-11T09:13:52.000000Z
当视觉大模型陷入认知失调,马里兰大学构建了一个幻觉自动生成框架
机器之心 2024-11-11T06:39:15.000000Z
SMART Filtering: Enhancing Benchmark Quality and Efficiency for NLP Model Evaluation
MarkTechPost@AI 2024-11-04T08:20:35.000000Z
ArabLegalEval: A Multitask AI Benchmark Dataset for Assessing the Arabic Legal Knowledge of LLMs
MarkTechPost@AI 2024-08-19T09:49:42.000000Z
Benchmark Self-Evolving | 自我进化的大模型动态评测基准
智源社区 2024-07-11T06:20:59.000000Z