热点
"能力评估" 相关文章
GovRelBench:A Benchmark for Government Domain Relevance
cs.AI updates on arXiv.org 2025-07-30T04:11:53.000000Z
ChatGPT Agent: evals and safeguards
少点错误 2025-07-25T16:37:33.000000Z
ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
cs.AI updates on arXiv.org 2025-07-23T04:03:06.000000Z
Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning
cs.AI updates on arXiv.org 2025-07-22T04:34:07.000000Z
Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening
cs.AI updates on arXiv.org 2025-07-17T04:14:15.000000Z
Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension?
cs.AI updates on arXiv.org 2025-07-14T04:08:31.000000Z
Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings
少点错误 2025-07-13T19:55:23.000000Z
[职场话题] 是不是因为拒绝加班是会被辞退的?
V2EX 2025-06-21T02:08:19.000000Z
[职场话题] 是不是因为拒绝加班是会被辞退的?
V2EX 2025-06-21T01:34:32.000000Z
[酷工作] 是不是因为拒绝加班是会被辞退的?
V2EX 2025-06-20T22:32:33.000000Z
[酷工作] 是不是因为拒绝加班是会被辞退的?
V2EX 2025-06-20T17:16:48.000000Z
[酷工作] 是不是因为拒绝加班是会被辞退的?
V2EX 2025-06-20T15:45:38.000000Z
[酷工作] 是不是因为拒绝加班是会被辞退的?
V2EX 2025-06-20T13:54:53.000000Z
ICML 2025 | 用“人类考试法”戳破AI泡沫:构建能力导向的自适应测评新范式
PaperWeekly 2025-05-27T06:22:33.000000Z
Recommendations for Technical AI Safety Research Directions
少点错误 2025-01-10T19:37:04.000000Z
I read every major AI lab’s safety plan so you don’t have to
少点错误 2024-12-16T21:12:06.000000Z
智源研究院推出全球首个中文大模型辩论平台FlagEval Debate
智源研究院 2024-10-24T17:00:57.000000Z
中小企业服务机构服务能力评估工作正式启动
深度 2024-09-12T02:03:45.000000Z
What is SB 1047 *for*?
少点错误 2024-09-05T17:52:06.000000Z