EnterpriseAI 2024年07月03日
Anthropic Pushes for Third-Party AI Model Evaluations
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anthropic 宣布了一项新的计划,旨在开发第三方模型评估,以测试人工智能的能力和风险。该计划旨在通过评估 AI 安全级别 (ASL)、高级能力和安全指标以及评估开发的基础设施、工具和方法来推动 AI 安全评估标准化。Anthropic 还强调了评估工具的必要特性,包括难度、数据独立性、多种格式和安全相关威胁建模。

🚀 **AI 安全级别评估:** Anthropic 将重点关注评估 AI 安全级别 (ASL),包括网络安全、化学、生物、放射性和核 (CBRN) 风险、模型自主性、国家安全风险、社会操纵、偏差风险等。

🧠 **高级能力和安全指标:** 评估将衡量高级模型能力,如危害性和拒绝、高级科学、改进的多语言评估和社会影响。

🛠️ **评估开发的基础设施、工具和方法:** Anthropic 希望通过关注模板/无代码评估开发平台、模型评分评估、提升和提升试验来简化评估流程,使其更加高效有效。

🛡️ **评估工具的必要特性:** Anthropic 强调了评估工具应该具备的特性,包括难度(评估应该足够难,以衡量 ASL-3 或 ASL-4 级别模型的能力)、数据独立性(评估不应包含训练数据)、多种格式(评估应包括多种格式,如基于任务的评估、模型评分评估甚至人为试验)和安全相关威胁建模(评估应侧重于现实的安全相关威胁建模)。

Although AI tools are rapidly advancing and becoming a part of just about every sector, the AI community is still looking for a standardized means to assess the capabilities and potential risks that these tools offer.   Although tools like Google-Proof Q&A exist to provide a foundation for assessing AI capabilities, current evaluations are generally too simplistic or have solutions readily available online.

Thus, Anthropic has recently announced a new initiative for developing third-party model evaluations to test AI capabilities and risks. An in-depth blog post from the company outlined the specific types of evaluations Anthropic is prioritizing, and readers are asked to send in a proposal for new evaluation methods.

Anthropic outlined three ikey areas of evaluation development that they will be focusing on:

    AI Safety Level assessments: Evaluations are meant to measure AI ASafety Levels (ASLs) to include focuses on cybersecurity; chemical, biological, radiological, and nuclear (CBRN) risks, model autonomy, national security risks, social manipulation, misalignment risks, and more.Advanced capability and safety metrics: Measurements of advanced model capabilities like harmfulness and refusals, advanced science, improved multilingual evaluations, and societal impacts.Infrastructure, tools, and methods for developing evaluations: Anthropic wants to streamline the evaluation process to be more efficient and effective by focusing on templates/No-code evaluation development platforms, evaluations for model grading, uplift and uplift trials.

In the hopes of spurring creative discussion, Anthropic also provided a list of characteristics that the company believes should be inherent in a valuable evaluation tool. While this list covers a wide variety of topics, there were some specific points of interest.

To begin, evaluations should be sufficiently difficult to measure the capabilities for levels ASL-3 or ASL-4 in Anthropic’s Responsible Scaling Policy. In a similar vein, the evaluation should not include training data.

“Too often, evaluations end up measuring model memorization because the data is in its training set,” the blog post stated. “Where possible and useful, make sure the model hasn’t seen the evaluation. This helps indicate that the evaluation is capturing behavior that generalizes beyond the training data.”

Additionally, Anthropic pointed out that a meaningful evaluation tool will comprise a variety of formats. Many evaluation tools focus specifically on multiple choice, and Anthropic states that other formats such as task-based evaluations, model-graded evaluations, or even human trials would help in truly evaluating an AI model’s capabilities.

Finally, and perhaps most interestingly, Anthropic states that realistic, safety-relevant thread modeling will be vital to a useful evaluation. Experts should ideally be able to conclude that a major incident could be caused by a model with a high score in a safety evaluation. When models perform well, experts have typically come to the conclusion that this is not reason for concern, even when the models perform well on that particular version of the evaluation. This does not allow for a proper evaluation.

At the moment, Anthropic is asking for proposals from those who wish to submit evaluation methods. The Anthropic team will review submissions on a rolling basis and follow up with certain proposals to discuss the next steps.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI 安全 模型评估 Anthropic
相关文章