少点错误 2024年10月26日
UK AISI: Early lessons from evaluating frontier AI systems
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨如何设计和运行第三方评估,包括考虑的关键要素及待解决问题。讨论了第三方评估者的角色、测试目标,如何有效评估及所需条件等,还提及测试时间、合作方式等内容。

🎯第三方评估者的角色及测试目标,包括测试哪些系统、何时测试以及测试哪些风险和能力。

📋如何有效评估,如根据目的选择测试、进行稳健测试以及确保安全等方面的内容。

⏳强调进行严格评估需有足够时间,考虑集中对‘代理’模型进行大部分测试及相关问题。

🤝评估者需保护评估的完整性,与公司合作理解信息共享的适当及必要程度,建立风险和能力阈值的共享框架及标准。

Published on October 25, 2024 7:00 PM GMT

This blog sets out our thinking to date on how to design and run third-party evaluations, including key elements to consider and open questions. This is not intended to provide robust recommendations; rather we want to start a conversation in the open about these practices and to learn from others.  

We discuss the role of third-party evaluators and what they could target for testing, including which systems to test, when to test them, and which risks and capabilities to test for. We also examine how to evaluate effectively, including which tests to use for which purpose, how to develop robust testing, and how to ensure safety and security.

Most important section, I think:

7. What is needed for effective testing?

We have learned a lot in our approach to evaluations to date, but there are significant challenges and areas of progress to make going forward.  

Access

It is clear from our experience that to run high quality evaluations that elicit high fidelity information to the potential risks posed by frontier systems, it is important for independent evaluators to have:

    Access to a Helpful Only (HO) version of the model, alongside the Helpful, Honest and Harmless (HHH) version of the model that will be deployed, the ability to turn off/on trust and safety safeguards, and fine-tuning API access. It is essential to elicit the full capabilities of a model as far as possible to evaluate the level of potential risk in a system and the sufficiency of existing mitigations.Regular technical discussions before and during testing with teams at the given company who have most experience with evaluating the model/system in question, including providing information on model capabilities and elicitation techniques, the safeguards that are put in place/are intended to be put in place for the deployed system, and results from internal evaluations which we can use for calibration.  

Testing window

It is important that we have sufficient time to conduct a rigorous evaluation of model capabilities, to provide quality assurance of results, and report back to frontier AI labs in advance of model deployment, in line with the testing tiers outlined in section 4. An option we are considering is to focus the bulk of testing on “proxy” models, which are available earlier and sufficiently similar to the model that will be deployed. How to structure such evaluations and validate results, and how to precisely define similarity are open scientific questions.

Working together

It is important that evaluators can protect the integrity of their evaluations, for example, through non-logging guarantees. We need to work with companies to better understand the appropriate and necessary level of information to share which enables trust in independent evaluation results, and the implementation of effective mitigations, without revealing our full testing methodology.

Additionally, we need to establish a shared framework and standard for risk and capability thresholds (as mentioned above), and what these thresholds entail in terms of mitigations expectations. Relatedly, we are developing our understanding of how to disclose findings to a developer where these might relate to potential national security risks.

Related: Model evals for dangerous capabilities [self-promotion].



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

第三方评估 测试目标 有效评估 时间与合作
相关文章