少点错误 2024年10月25日
How should we make trade-offs between the quantity and quality of labels used for eliciting knowledge from capable AI systems?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在训练和评估人工智能系统时,如何有效地利用低质量标签。研究发现,在低质量标签环境下,预训练模型往往具有产生正确答案的归纳偏差。文章提出了可扩展监督方法,通过利用不同质量的标签数据集,在有限的预算下提高模型性能。研究表明,使用少量高质量标签结合大量低质量标签,可以达到比单独使用高质量标签或低质量标签更高的精度。此外,使用少样本提示可以提高模型对任务的敏感度,从而提高模型的样本效率。

😁 **低质量标签的有效利用:** 研究表明,在低质量标签环境下,预训练模型往往具有产生正确答案的归纳偏差。通过利用不同质量的标签数据集,可以有效地提高模型性能。文章发现,使用少量高质量标签结合大量低质量标签,可以达到比单独使用高质量标签或低质量标签更高的精度。

🤔 **可扩展监督方法的优势:** 可扩展监督方法允许在有限的预算下,通过优化标签质量和数量的分配,最大程度地提高模型性能。文章提出了三种不同的训练模式:数量主导、质量主导和混合模式,并根据不同的预算条件,确定了每种模式的最佳标签分配策略。

🤩 **少样本提示的提升效果:** 文章指出,使用少样本提示可以提高模型对任务的敏感度,从而提高模型的样本效率。通过在训练过程中加入少样本提示,可以有效地提高模型的精度,即使在有限的标签预算下也能取得显著效果。

🤯 **可扩展监督方法的局限性:** 尽管可扩展监督方法在提高模型性能方面取得了显著效果,但它也存在一些局限性。例如,在处理一些恶意设计的模型时,可扩展监督方法可能无法有效地提取模型的真实知识,需要进一步研究更有效的技术来解决这一问题。

🧐 **未来的研究方向:** 文章建议未来研究应该集中在扩展标签成本和精度的帕累托边界,并回答一些相关问题,例如“我们应该期望提取知识的样本效率有多高?”

Published on October 24, 2024 4:49 PM GMT

ArXiv paper.

Thanks to Nora Belrose, Buck Shlegeris, Jan Hendrik Kirchner, and Ansh Radhakrishnan for guidance throughout the project.

Scalable oversight studies methods of training and evaluating AI systems in domains where human judgment is unreliable or expensive, such as scientific research and software engineering in complex codebases. Most work in this area has focused on methods of improving the quality of labels. Recent work by Burns et al. (2023) considers the complementary problem of training models with low-quality labels, finding that large pretrained models often have an inductive bias towards producing correct answers. In practice, however, neither label quantity nor quality is fixed: practitioners face a quantity-quality tradeoff. In this paper, we explore the microeconomics of the quantity-quality tradeoff on binary NLP classification tasks used in Burns et al. (2023). While sample-efficient learning has been studied extensively, little public research has focused on scalable elicitation: eliciting capabilities from pretrained models subject to labeling cost constraints. We find that this setting has novel dynamics caused by the tradeoff between label quantity and quality, as well as the model's existing latent capabilities. We observe three regimes of eliciting classification knowledge from pretrained models using supervised finetuning: quantity-dominant, quality-dominant, and a mixed regime involving the use of low- and high-quality data together to attain higher accuracy at a lower cost than using either alone. We explore sample-efficient elicitation methods that make use of two datasets of differing qualities, and establish a Pareto frontier of scalable elicitation methods that optimally trade off labeling cost and classifier performance. We find that the accuracy of supervised fine-tuning can be improved by up to 5 percentage points at a fixed labeling budget by adding a few-shot prompt to make use of the model's existing knowledge of the task.

How does this help with AI safety?

Ensuring safety of capable AI systems would be a lot easier if humans had access to all of the knowledge of the AIs they’re supervising. This is the broad framing that has motivated my interest in the Eliciting Latent Knowledge agenda. In this work, we try to measure how effective various elicitation strategies are (in binary classification settings) by plotting accuracy versus cost given various assumptions about the costs of low- and high-quality labels. We attempt to investigate scalable oversight as a quantitative rather than qualitative problem (the framing laid out in the post is roughly what motivates this work).

While I think our work has somewhat generalizable insights for non-scheming models, there may be additional difficulties when trying to elicit knowledge from schemers because intentionally misgeneralizing policies may be more salient than the policies we want to elicit for some distributions of inputs.

Summary of findings

Here are two of our findings:

1. There exists a “mixed” regime where some budget should be spent on a large quantity of low-quality labels before training on some high-quality labels

Here, we arbitrarily define high-quality labels to cost $1 and weak labels $0.10. So, along the x-axis, one high-quality label is given up for every 10 weak labels used. The model is sequentially trained on low- then high-quality labels. Different budgets produce 3 regimes with distinct optimal budget allocations:

Quality-dominant (budget≥$1024): No budget should be allocated to weak labels

Quantity-dominant (≤$64): All budget should be allocated to weak labels

Mixed ($256≤budget<$1024): Peak of the accuracy curve is somewhere in the middle

2. Increasing the salience of the task with a few-shot prompt consistently increases the sample-efficiency of SFT compared to either few-shot prompting or SFT alone.

We think that more research should be aimed at expanding the Pareto frontier of labeling cost and accuracy in realistic elicitation settings, and answering related questions like “How sample efficient should we expect elicitation to be?”



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

可扩展监督 低质量标签 样本效率 AI安全 预训练模型
相关文章