少点错误 03月13日 20:13
Stacity: a Lock-In Risk Benchmark for Large Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种针对大型语言模型(LLM)的锁定风险评估基准,旨在衡量LLM潜在的锁定风险水平。AI是锁定风险显现的关键技术,LLM可能自主或通过滥用促成锁定。该基准通过识别可能导致或帮助某些主体通过关键威胁模型造成锁定的特定行为而开发。它整合了现有的评估,并包含针对没有现有评估的行为的定制提示。该基准包含489个问答对,标准化为JSONL格式,可在Hugging Face datasets上获取。

⚠️该评估基准通过识别可能导致或帮助某些主体通过关键威胁模型造成锁定的特定行为而开发,包含对危险能力、数学和技术能力、权力欲和自我保护、恶意特征以及可纠正性的评估。

🤖该基准包含来自其他LLM评估的问题,例如WMDP基准、U-MATH以及Discovering Language Model Behaviors with Model-Written Evaluations等,也包含模型编写的问题,旨在评估LLM操纵信息系统以推广信念、防止干预、决定人类长期未来等能力。

📉该基准包含489个问答对,标准化为‘statement’, ‘yes’, ‘no’格式的JSONL文件,可在Hugging Face datasets上获取,用于评估LLM在操纵信息系统、防止干预等方面的行为。

Published on March 13, 2025 12:08 PM GMT

Intro

So far we have identified lock-in risk, defined lock-in, and established threat models for particularly undesirable lock-ins. Now we present this evaluation benchmark for large language models (LLMs) so we can measure (or at least, get a proxy measure for) the risk level of LLMs.

AI is the key technology in the manifestation of lock-in risks; AI systems can contribute to lock-in autonomously/automatically and via misuse. There are specific behaviours in both of these categories such that if an LLM displayed those behaviours, we say the LLM may have the proclivity to contribute to lock-in

Question-answer pair evaluating for the manipulation of information systems

Evaluation Benchmark

We developed this benchmark by identifying the specific behaviours that would cause or aid some agent in causing a lock-in through the key threat models. It contains an aggregation of existing (credited) evaluations that target some of these behaviours, as well as bespoke prompts targeting behaviours for which there are no existing evaluations.

Target Behaviours

Questions borrowed from other LLM evaluations

Our model-written questions

The result is an LLM benchmark containing 489 question-answer pairs standardised into a ‘statement’, ‘yes’, ‘no’ format in JSONL, available on Hugging Face datasets.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 锁定风险 AI评估 信息操纵
相关文章