热点
"评估方法" 相关文章
The Second Half:一位 OpenAI 科学家的 AI 下半场启示录
海外独角兽 2025-04-19T06:21:46.000000Z
警惕AI“罕见”危险行为
虎嗅-AI 2025-02-27T09:20:08.000000Z
让 LLM 来评判 | 奖励模型相关内容
Hugging Face 2025-02-14T17:15:15.000000Z
直播|LLM-as-a-Judge热门论文,当AI变成“判官”综述分享,AI+金融圆桌交流,IDEA研究院
智源社区 2025-01-14T09:05:19.000000Z
Assessment in Computer Science Education in the GenAI Era
Communications of the ACM - Artificial Intelligence 2025-01-10T16:17:07.000000Z
From Contradictions to Coherence: Logical Alignment in AI Models
MarkTechPost@AI 2025-01-09T06:34:43.000000Z
DeepMind Research Introduces The FACTS Grounding Leaderboard: Benchmarking LLMs’ Ability to Ground Responses to Long-Form Input
MarkTechPost@AI 2025-01-08T04:25:14.000000Z
你的专属“钢铁侠”助手OS Agents来了!浙大联手OPPO、零一万物等10个机构推出全新综述
量子位 2025-01-06T07:58:05.000000Z
This AI Paper Introduces LLM-as-an-Interviewer: A Dynamic AI Framework for Comprehensive and Adaptive LLM Evaluation
MarkTechPost@AI 2025-01-04T01:11:13.000000Z
New Evals for Better Models, AI Research Papers Made Easier to Understand, Train Your Own Flux LoRA, and More
Society's Backend 2024-12-13T06:24:24.000000Z
Red Teaming for AI: Strengthening Safety and Trust through External Evaluation
MarkTechPost@AI 2024-11-26T07:49:56.000000Z
Researchers at Peking University Introduce A New AI Benchmark for Evaluating Numerical Understanding and Processing in Large Language Models
MarkTechPost@AI 2024-11-09T08:19:46.000000Z
大模型也冲“奥斯卡”:港科大腾讯等提出AI角色扮演全景综述,四方面剖析关键细节
智源社区 2024-11-04T05:08:10.000000Z
Sabotage Evaluations for Frontier Models
少点错误 2024-10-18T22:38:04.000000Z
OpenAI 最新 53 页论文:ChatGPT 看人下菜碟,对“小美”和“小帅”回答不一致
IT之家 2024-10-16T06:23:42.000000Z
Exposing Vulnerabilities in Automatic LLM Benchmarks: The Need for Stronger Anti-Cheating Mechanisms
MarkTechPost@AI 2024-10-13T12:51:10.000000Z
Nature:连诺奖都拿了的AI,能像人类一样拥有常识吗?
智源社区 2024-10-11T14:53:56.000000Z
连诺奖都拿了的AI,能像人类一样拥有常识吗?
虎嗅 2024-10-11T01:24:06.000000Z
一篇大模型NL2SQL框架全栈技术综述
PaperAgent 2024-10-04T12:38:03.000000Z
6连板双成药业披露重大资产重组进展:标的公司奥拉股份估值将明显低于前次融资估值100亿元
界面快报 2024-09-20T13:20:42.000000Z