AI基准_Fishai

热点

"AI基准" 相关文章

TextQuests: How Good are LLMs at Text-Based Video Games?

cs.AI updates on arXiv.org 2025-08-01T04:08:16.000000Z

Import AI 413: 40B distributed training run; avoiding the ‘One True Answer’ fallacy of AI safety; Google releases a content classification model

Import AI 2025-05-19T12:57:58.000000Z

These two new AI benchmarks could help make models less biased

MIT Technology Review » Artificial Intelligence 2025-03-11T09:37:35.000000Z

这届出题太难了！新基准让多模态模型集体自闭，GPT-4o都是零分

机器之心 2025-02-18T07:08:57.000000Z

‘Not on the Best Path’

Communications of the ACM - Artificial Intelligence 2025-02-13T17:25:21.000000Z

Meta AI Introduces PARTNR: A Research Framework Supporting Seamless Human-Robot Collaboration in Multi-Agent Tasks

MarkTechPost@AI 2025-02-12T16:57:09.000000Z

DeepSeek-R1、o1都低于10%，人类给AI的「最后考试」来了，贡献者名单长达两页

机器之心 2025-02-08T07:50:03.000000Z

研究人员使用公共广播电台的"星期日之谜"问题为AI推理模型设定基准

Cnbeta 2025-02-06T08:34:14.000000Z

DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks

TechCrunch News 2025-01-27T22:35:57.000000Z

即使是最好的AI也无法超越这一新基准得分甚至不到10%

Cnbeta 2025-01-24T02:07:07.000000Z

因延后披露受 OpenAI 资助事实，AI 基准测试组织 FrontierMath 被质疑行为不当

IT之家 2025-01-20T14:22:26.000000Z

PredBench: A Comprehensive AI Benchmark for Evaluating 12 Spatio-Temporal Prediction Methods Across 15 Diverse Datasets with Multi-Dimensional Analysis

MarkTechPost@AI 2024-07-18T04:46:29.000000Z

MJ-BENCH: A Multimodal AI Benchmark for Evaluating Text-to-Image Generation with Focus on Alignment, Safety, and Bias

MarkTechPost@AI 2024-07-13T04:31:18.000000Z

GraCoRe: A New AI Benchmark for Unveiling Strengths and Weaknesses in LLM Graph Comprehension and Reasoning

MarkTechPost@AI 2024-07-09T07:31:27.000000Z

DataComp for Language Models (DCLM): An AI Benchmark for Language Model Training Data Curation

MarkTechPost@AI 2024-06-19T12:01:39.000000Z

Copyright © 2019 FISHAI.All Rights Reserved