热点
关于我们
xx
xx
"
AI基准
" 相关文章
Import AI 413: 40B distributed training run; avoiding the ‘One True Answer’ fallacy of AI safety; Google releases a content classification model
Import AI
2025-05-19T12:57:58.000000Z
These two new AI benchmarks could help make models less biased
MIT Technology Review » Artificial Intelligence
2025-03-11T09:37:35.000000Z
这届出题太难了!新基准让多模态模型集体自闭,GPT-4o都是零分
机器之心
2025-02-18T07:08:57.000000Z
‘Not on the Best Path’
Communications of the ACM - Artificial Intelligence
2025-02-13T17:25:21.000000Z
Meta AI Introduces PARTNR: A Research Framework Supporting Seamless Human-Robot Collaboration in Multi-Agent Tasks
MarkTechPost@AI
2025-02-12T16:57:09.000000Z
DeepSeek-R1、o1都低于10%,人类给AI的「最后考试」来了,贡献者名单长达两页
机器之心
2025-02-08T07:50:03.000000Z
研究人员使用公共广播电台的"星期日之谜"问题为AI推理模型设定基准
Cnbeta
2025-02-06T08:34:14.000000Z
DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks
TechCrunch News
2025-01-27T22:35:57.000000Z
即使是最好的AI也无法超越这一新基准 得分甚至不到10%
Cnbeta
2025-01-24T02:07:07.000000Z
因延后披露受 OpenAI 资助事实,AI 基准测试组织 FrontierMath 被质疑行为不当
IT之家
2025-01-20T14:22:26.000000Z
PredBench: A Comprehensive AI Benchmark for Evaluating 12 Spatio-Temporal Prediction Methods Across 15 Diverse Datasets with Multi-Dimensional Analysis
MarkTechPost@AI
2024-07-18T04:46:29.000000Z
MJ-BENCH: A Multimodal AI Benchmark for Evaluating Text-to-Image Generation with Focus on Alignment, Safety, and Bias
MarkTechPost@AI
2024-07-13T04:31:18.000000Z
GraCoRe: A New AI Benchmark for Unveiling Strengths and Weaknesses in LLM Graph Comprehension and Reasoning
MarkTechPost@AI
2024-07-09T07:31:27.000000Z
DataComp for Language Models (DCLM): An AI Benchmark for Language Model Training Data Curation
MarkTechPost@AI
2024-06-19T12:01:39.000000Z