语言模型评估_Fishai

热点

"语言模型评估" 相关文章

Question Generation for Assessing Early Literacy Reading Comprehension

cs.AI updates on arXiv.org 2025-07-31T04:48:05.000000Z

Building on evaluation quicksand

Interconnects 2024-10-22T06:07:43.000000Z

Meet TurtleBench: A Unique AI Evaluation System for Evaluating Top Language Models via Real World Yes/No Puzzles

MarkTechPost@AI 2024-10-17T01:35:56.000000Z

OpenAI 发布 MMMLU 数据集：更广、更深评估 AI 模型，支持简体中文

ReadHub 2024-09-24T08:08:50.000000Z

Michelangelo: An Artificial Intelligence Framework for Evaluating Long-Context Reasoning in Large Language Models Beyond Simple Retrieval Tasks

MarkTechPost@AI 2024-09-22T12:05:34.000000Z

This AI Paper by Allen Institute Researchers Introduces OLMES: Paving the Way for Fair and Reproducible Evaluations in Language Modeling

MarkTechPost@AI 2024-06-21T09:01:43.000000Z

Application Task Driven: LLM Evaluation Metrics in Detail

DZone AI/ML Zone 2024-06-03T17:30:39.000000Z

EleutherAI Presents Language Model Evaluation Harness (lm-eval) for Reproducible and Rigorous NLP Assessments, Enhancing Language Model Evaluation

MarkTechPost@AI 2024-05-26T06:31:00.000000Z

Copyright © 2019 FISHAI.All Rights Reserved