热点
"评估指标" 相关文章
一文读懂RAG评估:解锁大模型性能密码
掘金 人工智能 2025-08-01T11:35:12.000000Z
Data Augmentation for Spoken Grammatical Error Correction
cs.AI updates on arXiv.org 2025-07-28T04:43:00.000000Z
Diffusion Models for Time Series Forecasting: A Survey
cs.AI updates on arXiv.org 2025-07-22T04:44:33.000000Z
Transformer-based Spatial Grounding: A Comprehensive Survey
cs.AI updates on arXiv.org 2025-07-18T04:13:56.000000Z
A Survey of Deep Learning for Geometry Problem Solving
cs.AI updates on arXiv.org 2025-07-17T04:14:37.000000Z
SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection
cs.AI updates on arXiv.org 2025-07-16T04:28:34.000000Z
Comprehensive Evaluation of Prototype Neural Networks
cs.AI updates on arXiv.org 2025-07-10T04:05:52.000000Z
Understanding Knowledge Transferability for Transfer Learning: A Survey
cs.AI updates on arXiv.org 2025-07-08T05:54:10.000000Z
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
cs.AI updates on arXiv.org 2025-07-08T04:33:41.000000Z
Generating Heterogeneous Multi-dimensional Data : A Comparative Study
cs.AI updates on arXiv.org 2025-07-02T04:03:45.000000Z
RAG知识库评估与调试实战指南:上下文丢失×信息忽略×多轮对话崩溃
掘金 人工智能 2025-05-23T06:23:04.000000Z
Subject-Driven Image Evaluation Gets Simpler: Google Researchers Introduce REFVNLI to Jointly Score Textual Alignment and Subject Consistency Without Costly APIs
MarkTechPost@AI 2025-05-02T20:05:39.000000Z
ICLR 2025 | 缺乏金标准时的大语言模型评论基准测试
智源社区 2025-04-25T06:43:12.000000Z
让 LLM 来评判 | 评估你的评估结果
Hugging Face 2025-04-09T10:06:26.000000Z
最新文化建设评估标准公布,明确六类加分项,发挥智库作用、鼓励国际交流等被专章写进
深度财经头条 2025-03-21T09:31:30.000000Z
亚洲四国及美国老年人自主经济来源与总体财务健康状况分析报告
互联网数据资讯网-199IT 2025-03-14T23:01:19.000000Z
DeepSeek、OpenAI、Kimi视觉推理到底哪家强?港中文MMLab推出推理基准MME-COT
智源社区 2025-02-23T12:37:14.000000Z
让 LLM 来评判 | 评估你的评估结果
智源社区 2025-02-11T03:22:24.000000Z
让 LLM 来评判 | 评估你的评估结果
Hugging Face 2025-02-10T16:15:18.000000Z
揭秘大模型强推理能力幕后功臣“缺陷”,过程级奖励模型新基准来了
量子位 2025-01-19T07:41:41.000000Z