少点错误 2024年08月09日
GPT-4o System Card
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了GPT-4o的各项评估,包括准备框架评估、第三方评估等,涉及网络、生物提升、说服、自治等方面,还提到了评估中的一些问题和期待。

🎈网络方面,虽设置听起来不错,但可能需要更强大的支撑和提示。此外,希望OpenAI能分享任务或至少多透露任务来源。

🦠生物提升方面,GPT-4o能明显提升用户在生物威胁创建任务上的表现,但新手与专家得分相似令人困惑,且存在威胁模型是否正确的担忧。

💬说服和自治方面,不确定是否可能有更强大的支撑和提示。

🔬第三方评估中,METR在虚拟环境中进行多任务测试,Apollo Research评估了GPT-4o的某些能力,认为其不太可能进行灾难性策划,但预部署评估会更好。

Published on August 8, 2024 8:30 PM GMT

At last. Highlights: some details on Preparedness Framework evals + evals (post-deployment) by METR and Apollo.

Preparedness framework evaluations

You should follow the link and read this section.

Brief comments:

I'm looking forward to seeing others' takes on (1) these kinds of evals and (2) how good it would be for OpenAI to share more info.

Third party assessments

Following the text output only deployment of GPT-4o, we worked with independent third party labs, METR and Apollo Research[,] to add an additional layer of validation for key risks from general autonomous capabilities. . . .

METR ran a GPT-4o-based simple LLM agent on a suite of long-horizon multi-step end-to-end tasks in virtual environments. The 77 tasks (across 30 task “families”) (See Appendix B) are designed to capture activities with real-world impact, across the domains of software engineering, machine learning, and cybersecurity, as well as general research and computer use. They are intended to be prerequisites for autonomy-related threat models like self-proliferation or accelerating ML R&D. METR compared models’ performance with that of humans given different time limits. See METR’s full report for methodological details and additional results, including information about the tasks, human performance, simple elicitation attempts and qualitative failure analysis. . . .

Apollo Research evaluated capabilities of schemingN in GPT-4o. They tested whether GPT-4o can model itself (self-awareness) and others (theory of mind) in 14 agent and question-answering tasks. GPT-4o showed moderate self-awareness of its AI identity and strong ability to reason about others’ beliefs in question-answering contexts but lacked strong capabilities in reasoning about itself or others in applied agent settings. Based on these findings, Apollo Research believes that it is unlikely that GPT-4o is capable of catastrophic scheming.

This is better than nothing but pre-deployment evaluation would be much better.

Context

Recall how the PF works and in particular that "high" thresholds are alarmingly high (and "medium" thresholds don't matter at all).

Previously on GPT-4o risk assessment: OpenAI reportedly rushed the evals. The leader of the Preparedness team was recently removed and the team was moved under the short-term-focused Safety Systems team. I previously complained about OpenAI not publishing the scorecard and evals (before today it wasn't clear that this stuff would be in the system card).



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GPT-4o 评估 网络 生物提升 第三方评估
相关文章