热点
"语言模型评估" 相关文章
Question Generation for Assessing Early Literacy Reading Comprehension
cs.AI updates on arXiv.org 2025-07-31T04:48:05.000000Z
Building on evaluation quicksand
Interconnects 2024-10-22T06:07:43.000000Z
Meet TurtleBench: A Unique AI Evaluation System for Evaluating Top Language Models via Real World Yes/No Puzzles
MarkTechPost@AI 2024-10-17T01:35:56.000000Z
OpenAI 发布 MMMLU 数据集:更广、更深评估 AI 模型,支持简体中文
ReadHub 2024-09-24T08:08:50.000000Z
Michelangelo: An Artificial Intelligence Framework for Evaluating Long-Context Reasoning in Large Language Models Beyond Simple Retrieval Tasks
MarkTechPost@AI 2024-09-22T12:05:34.000000Z
This AI Paper by Allen Institute Researchers Introduces OLMES: Paving the Way for Fair and Reproducible Evaluations in Language Modeling
MarkTechPost@AI 2024-06-21T09:01:43.000000Z
Application Task Driven: LLM Evaluation Metrics in Detail
DZone AI/ML Zone 2024-06-03T17:30:39.000000Z
EleutherAI Presents Language Model Evaluation Harness (lm-eval) for Reproducible and Rigorous NLP Assessments, Enhancing Language Model Evaluation
MarkTechPost@AI 2024-05-26T06:31:00.000000Z