热点
关于我们
xx
xx
"
模型评估
" 相关文章
大模型评估排障指南 | 关于 LaTeX 公式解析
Hugging Face
2025-06-12T02:32:47.000000Z
AI companies' eval reports mostly don't support their claims
少点错误
2025-06-09T13:02:35.000000Z
AI疯狂进化6个月,一张天梯图全浓缩,30+模型混战,大神演讲爆火
36氪 - AI相关文章
2025-06-09T12:19:21.000000Z
奥特曼ChatGPT用法错了,最新研究:要求“直接回答”降低准确率,思维链提示作用也在下降
36氪 - 科技频道
2025-06-09T10:34:26.000000Z
The Decreasing Value of Chain of Thought in Prompting
少点错误
2025-06-08T15:22:35.000000Z
苹果最新研究:现有 AI 大模型“更像是在记忆,而非真正的推理”
IT之家
2025-06-08T08:18:22.000000Z
DeepSeek-r1-0528 Did Not Have a Moment
少点错误
2025-06-06T15:42:34.000000Z
昇腾NPU上基于MindIE服务的AIME和MATH500测评方案
掘金 人工智能
2025-06-06T02:53:44.000000Z
Claude 4 核心成员:2027年,AI将自动化几乎所有白领工作
虎嗅
2025-05-31T13:39:16.000000Z
New website analyzing AI companies' model evals
少点错误
2025-05-26T16:07:31.000000Z
Meta Researchers Introduced J1: A Reinforcement Learning Framework That Trains Language Models to Judge With Reasoned Consistency and Minimal Data
MarkTechPost@AI
2025-05-21T20:40:47.000000Z
从BGE到 CLIP,从文本到多模态,Embedding 模型选型终极指南
Zilliz
2025-05-20T11:51:02.000000Z
It Is Untenable That Near-Future AI Scenario Models Like “AI 2027” Don't Include Open Source AI
少点错误
2025-05-16T02:27:27.000000Z
Hacker News热文|“Jagged AGI”:o3和Gemini 2.5究竟是不是AGI?
硅星GenAI
2025-05-14T20:16:22.000000Z
Use custom metrics to evaluate your generative AI application with Amazon Bedrock
AWS Machine Learning Blog
2025-05-06T21:52:53.000000Z
68页论文再锤大模型竞技场!Llama4发布前私下测试27个版本,只取最佳成绩
智源社区
2025-05-06T02:48:01.000000Z
GPT-4o Sycophancy Post Mortem
少点错误
2025-05-05T16:02:31.000000Z
AI圈惊天丑闻,Meta作弊刷分实锤?顶级榜单曝黑幕,斯坦福MIT痛斥
智源社区
2025-05-02T15:39:42.000000Z
68 页论文再锤大模型竞技场:Llama4 发布前私下测试 27 个版本,只取最佳成绩
IT之家
2025-05-02T12:48:50.000000Z
Study accuses LM Arena of helping top AI labs game its benchmark
TechCrunch News
2025-05-01T00:16:26.000000Z