AI能力评估_Fishai

热点

"AI能力评估" 相关文章

DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering

cs.AI updates on arXiv.org 2025-07-16T04:28:40.000000Z

Interpreting the METR Time Horizons Post

少点错误 2025-04-30T03:12:28.000000Z

Recent AI model progress feels mostly like bullshit

少点错误 2025-03-24T19:32:10.000000Z

The Elicitation Game: Evaluating capability elicitation techniques

少点错误 2025-02-27T20:36:59.000000Z

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

TechCrunch News 2025-02-06T06:12:36.000000Z

Understanding Benchmarks and motivating Evaluations

少点错误 2025-02-06T01:51:47.000000Z

“人类终极考试”基准测试发布：顶级 AI 系统表现惨淡，回答准确率均未超 10%

IT之家 2025-01-24T08:37:28.000000Z

Copyright © 2019 FISHAI.All Rights Reserved