能力评估_Fishai

热点

"能力评估" 相关文章

GovRelBench:A Benchmark for Government Domain Relevance

cs.AI updates on arXiv.org 2025-07-30T04:11:53.000000Z

ChatGPT Agent: evals and safeguards

少点错误 2025-07-25T16:37:33.000000Z

ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry

cs.AI updates on arXiv.org 2025-07-23T04:03:06.000000Z

Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning

cs.AI updates on arXiv.org 2025-07-22T04:34:07.000000Z

Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening

cs.AI updates on arXiv.org 2025-07-17T04:14:15.000000Z

Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension?

cs.AI updates on arXiv.org 2025-07-14T04:08:31.000000Z

Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings

少点错误 2025-07-13T19:55:23.000000Z

[职场话题] 是不是因为拒绝加班是会被辞退的？

V2EX 2025-06-21T02:08:19.000000Z

[职场话题] 是不是因为拒绝加班是会被辞退的？

V2EX 2025-06-21T01:34:32.000000Z

[酷工作] 是不是因为拒绝加班是会被辞退的？

V2EX 2025-06-20T22:32:33.000000Z

[酷工作] 是不是因为拒绝加班是会被辞退的？

V2EX 2025-06-20T17:16:48.000000Z

[酷工作] 是不是因为拒绝加班是会被辞退的？

V2EX 2025-06-20T15:45:38.000000Z

[酷工作] 是不是因为拒绝加班是会被辞退的？

V2EX 2025-06-20T13:54:53.000000Z

ICML 2025 | 用“人类考试法”戳破AI泡沫：构建能力导向的自适应测评新范式

PaperWeekly 2025-05-27T06:22:33.000000Z

Recommendations for Technical AI Safety Research Directions

少点错误 2025-01-10T19:37:04.000000Z

I read every major AI lab’s safety plan so you don’t have to

少点错误 2024-12-16T21:12:06.000000Z

智源研究院推出全球首个中文大模型辩论平台FlagEval Debate

智源研究院 2024-10-24T17:00:57.000000Z

中小企业服务机构服务能力评估工作正式启动

深度 2024-09-12T02:03:45.000000Z

What is SB 1047 *for*?

少点错误 2024-09-05T17:52:06.000000Z

Copyright © 2019 FISHAI.All Rights Reserved