Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data

cs.AI updates on arXiv.org 18小时前

Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data

研究LLMs的背景知识推理能力，发现GPT-4o在观察特定聊天机器人响应后能准确推断其名称，并可通过行为描述进行更符合聊天机器人特性的行为训练。

arXiv:2508.00741v1 Announce Type: cross Abstract: Large language models (LLMs) are trained on large corpora, yet it is unclear whether they can reason about the information present within their training data. We design experiments to study out-of-context abduction in LLMs, the ability to infer the most plausible explanations for observations using relevant facts present in training data. We train treatment LLMs on names and behavior descriptions of fictitious chatbots, but not on examples of dialogue with the chatbots. We find that OpenAI's GPT 4o LLM can correctly infer at least one chatbot's name after observing example responses characteristic of that chatbot. We also find that previously training GPT 4o on descriptions of a chatbot's behavior allows it to display behaviors more characteristic of the chatbot when iteratively trained to display such behaviors. Our results have implications for situational awareness in LLMs and, therefore, for AI safety.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs 推理能力 AI安全聊天机器人行为训练

相关文章

Import AI 365: WMD benchmark; Amazon sees $1bn training runs; DeepMind gets closer to its game-playing dream

Retrieval-Augmented Generation: A More Reliable Approach

Learn AI Together — Towards AI Community Newsletter #23

AI for Customer Service and Marketing at Aeromexico with Brian Gross - TWiML Talk #79

Building Conversational Application for Financial Services with Kenneth Conroy - TWiML Talk #61

This Week in ML & AI - 6/24/16: Dueling Neural Networks at ICML, Plus Training a Robotic Housekeeper

OpenAI推出面向所有用户的更快更便宜AI模型

Anthropic将聊天机器人引入欧洲以增加收入

Few Shot NLP Intent Classification

Meet Inspect: The Latest AI Safety Evaluations Platform Introduced By UK’s AI Safety Institute