Understanding Chain-of-Thought in LLMs through Information Theory

cs.AI updates on arXiv.org 07月11日 12:04

Understanding Chain-of-Thought in LLMs through Information Theory

本文通过信息理论视角，对LLMs的CoT推理进行形式化，量化每一步推理的信息增益，无需标注数据集即可识别LLMs的失败模式，并在多个数据集上显著优于现有方法。

arXiv:2411.11984v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through the use of Chain-of-Thought (CoT) reasoning, allowing models to break down problems into manageable sub-tasks. However, existing CoT evaluation techniques either require annotated CoT data or fall short in accurately assessing intermediate reasoning steps, leading to high rates of false positives. In this paper, we formalize CoT reasoning in LLMs through an information-theoretic lens. Specifically, our framework quantifies the `information-gain' at each reasoning step, enabling the identification of failure modes in LLMs without the need for expensive annotated datasets. We demonstrate the efficacy of our approach through extensive experiments on toy arithmetic, GSM8K and PRM800k datasets, where it significantly outperforms existing outcome-based methods by providing more accurate insights into model performance on individual subtasks.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLMs CoT推理信息理论模型评估

相关文章

英國釋出AI模型安全評估平臺Inspect

FinRobot: A Novel Open-Source AI Agent Platform Supporting Multiple Financially Specialized AI Agents Powered by LLMs

Show HN: 让开发人员方便使用 LLM 的 CLI

如何优化 LLM 以提高准确性

Show HN: Chatty - 用于在浏览器中运行 LLM 的免费人工智能私人聊天工具

法学硕士在引用资料来源时几乎都是正确的，对此最好的解释是什么？

GenAI-Arena: An Open Platform for Community-Based Evaluation of Generative AI Models

With 700,000 Large Language Models (LLMs) On Hugging Face Already, Where Is The Future of Artificial Intelligence AI Headed?

未来的杀手级AI应用，正在扣子上匿名PK

GPT-4批评GPT-4实现「自我提升」，OpenAI前超级对齐团队又一力作被公开