热点
"AI模型评估" 相关文章
两位大模型从业者群友如何评价小米MiMo大模型?
理想 TOP2 2025-05-08T07:51:29.000000Z
Crowdsourced AI benchmarks have serious flaws, some experts say
TechCrunch News 2025-04-22T12:36:37.000000Z
OpenAI 收购 Context.ai 团队,AI 评估能力再升级
IT之家 2025-04-15T23:28:40.000000Z
OpenAI hires team behind GV-backed AI eval platform Context.ai
TechCrunch News 2025-04-15T18:21:22.000000Z
OpenAI实名举报Grok3作弊,一题答64次踩着台阶和o3-mini比
量子位 2025-02-24T01:13:50.000000Z
令人难以置信!AI大神评Grok 3:性能媲美OpenAI最强模型,略优于DeepSeek-R1
华尔街见闻 - 资讯 - undefined 2025-02-18T06:38:39.000000Z
Singapore University of Technology and Design (SUTD) Explores Advancements and Challenges in Multimodal Reasoning for AI Models Through Puzzle-Based Evaluations and Algorithmic Problem-Solving Analysis
MarkTechPost@AI 2025-02-08T04:20:02.000000Z
How we evaluate AI models and LLMs for GitHub Copilot
The GitHub Blog 2025-01-17T18:00:52.000000Z
让「幻觉」无处遁形!谷歌DeepMind全新基准,三代Gemini同台霸榜
新智元 2025-01-13T16:54:44.000000Z
傳Google用Anthropic Claude測試Gemini模型
AI & Big Data 2024-12-25T05:02:35.000000Z
Google is using Anthropic’s Claude to improve its Gemini AI
TechCrunch News 2024-12-24T16:22:11.000000Z
A safe harbor for AI evaluation and red teaming
AI Snake Oil 2024-12-13T05:08:43.000000Z
集成500+多模态现实任务!全新MEGA-Bench评测套件:CoT对开源模型反而有害?
新智元 2024-11-16T14:16:08.000000Z
集成500+多模态现实任务,全新MEGA-Bench评测套件:CoT对开源模型反而有害?
36kr-科技 2024-11-15T07:36:44.000000Z
Scoring AI models: Endor Labs unveils evaluation tool
AI News 2024-10-16T13:19:32.000000Z
OpenAI 推出 SWE-bench Verified 基准,更准确评估 AI 模型代码生成表现
IT之家 2024-08-15T06:52:31.000000Z
OpenAI推出代码生成评估基准
ReadHub 2024-08-14T01:37:59.000000Z
Meta 推出“自学评估器”:无需人工注释改善评估,性能超 GPT-4 等常用 AI 大语言模型评审
IT之家 2024-08-07T08:07:34.000000Z
大语言模型评测技术介绍
Security产业趋势 2024-07-23T16:07:05.000000Z
Top 12 Trending LLM Leaderboards: A Guide to Leading AI Models’ Evaluation
MarkTechPost@AI 2024-06-03T04:01:02.000000Z