LLMs评估_Fishai

热点

"LLMs评估" 相关文章

StructText: A Synthetic Table-to-Text Approach for Benchmark Generation with Multi-Dimensional Evaluation

cs.AI updates on arXiv.org 2025-07-30T04:46:13.000000Z

Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles

cs.AI updates on arXiv.org 2025-07-22T04:34:38.000000Z

MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark

cs.AI updates on arXiv.org 2025-06-30T04:14:30.000000Z

ArabLegalEval: A Multitask AI Benchmark Dataset for Assessing the Arabic Legal Knowledge of LLMs

MarkTechPost@AI 2024-08-19T09:49:42.000000Z

Copyright © 2019 FISHAI.All Rights Reserved