少点错误 2024年12月17日
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一个关于LLM在决策理论方面的研究。团队制作了基准,探究LLM在决策理论上的表现,发现态度在现有模型间差异显著,高性能模型与倾向证据决策理论的态度相关,且模型态度在不同问题类型中具有一致性。

团队制作LLM决策理论基准,包含能力和态度问题

现有模型态度差异显著,高性能模型倾向证据决策理论

模型态度在理论和务实问题中具有一致性

LLM遵循的决策理论影响其合作能力

Published on December 16, 2024 10:42 PM GMT

I’ve spent a lot of the last few years working on issues related to acausal cooperation. With LLMs being clearly dominant over recent years, I’ve now led a team to make a benchmark to figure out how good LLMs are at decision theory and whether and when they lean more CDT or EDT. We hope to expand this dataset in the future, including by incorporating questions that try to measure the updatelessness dimension. Hopefully, this dataset will be useful for future interventions aimed at improving acausal interactions.

Abstract:

We introduce a dataset of natural-language questions in the decision theory of so-called Newcomb-like problems. Newcomb-like problems include, for instance, decision problems in which an agent interacts with a similar other agent, and thus has to reason about the fact that the other agent will likely reason in similar ways. Evaluating LLM reasoning about Newcomb-like problems is important because interactions between foundation-model-based agents will often be Newcomb-like. Some ways of reasoning about Newcomb-like problems may allow for greater cooperation between models.

Our dataset contains both capabilities questions (i.e., questions with a unique, uncontroversially correct answer) and attitude questions (i.e., questions about which decision theorists would disagree). We use our dataset for an investigation of decision-theoretical capabilities and expressed attitudes and their interplay in existing models (different models by OpenAI, Anthropic, Meta, GDM, Reka, etc.), as well as models under simple prompt-based interventions. We find, among other things, that attitudes vary significantly between existing models; that high capabilities are associated with attitudes more favorable toward so-called evidential decision theory; and that attitudes are consistent across different types of questions.

Twitter thread:

How do LLMs reason about playing games against copies of themselves? ?We made the first LLM decision theory benchmark to find out. ?

Decision theory tackles questions of rational choice, especially in interactions with copies or simulations of yourself. Rare for humans but potentially very important for language models!

Our team, which includes academic decision theory researchers, spent hundreds of hours hand-generating 400+ multiple-choice questions to test how well LLMs reason about two key decision theories: causal and evidential. We also made 100+ qs to test which theory LLMs prefer.

Weaker models, including some versions of GPT 3.5, got <50% right on our benchmark – barely better than random guessing.

Cutting-edge models perform better but are far from perfect. OpenAI’s o1 leads with ~75% accuracy. We expect human experts to score nearly 100%.

Models varied on which decision theory they prefer. Surprisingly, better performance on our capabilities benchmark was correlated with preferring evidential over causal decision theory (with chain of thought).

This is puzzling – there’s no human expert consensus on which decision theory is better.

We found that model attitudes are consistent between theoretical and pragmatic questions: Models that recommend EDT-aligned actions also tend to give more EDT-aligned answers on abstract questions.

How well LLMs follow which decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control.

Our dataset opens the door to studying what shapes models’ decision theories. It also lets us test whether changing which theory models endorse affects their real-life decisions. To learn more, read the full paper: https://arxiv.org/abs/2411.10588



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 决策理论 基准研究 合作能力
相关文章