热点
关于我们
xx
xx
"
AI能力评估
" 相关文章
Interpreting the METR Time Horizons Post
少点错误
2025-04-30T03:12:28.000000Z
Recent AI model progress feels mostly like bullshit
少点错误
2025-03-24T19:32:10.000000Z
The Elicitation Game: Evaluating capability elicitation techniques
少点错误
2025-02-27T20:36:59.000000Z
These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models
TechCrunch News
2025-02-06T06:12:36.000000Z
Understanding Benchmarks and motivating Evaluations
少点错误
2025-02-06T01:51:47.000000Z
“人类终极考试”基准测试发布:顶级 AI 系统表现惨淡,回答准确率均未超 10%
IT之家
2025-01-24T08:37:28.000000Z