On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study

cs.AI updates on arXiv.org 07月09日 12:01

On the Bias of Next-Token Predictors Toward Systematically Inefficient Reasoning: A Shortest-Path Case Study

本文探讨了在大型语言模型中提高推理能力的两个关键因素：合理冗余和结构化思维链。通过在分层图的最短路径任务中建立控制环境，研究发现，在相同的训练资源下，基于无效推理轨迹训练的模型在未见过的图上具有更好的泛化能力，这得益于模型对下一词预测的信心，而非冗余长度本身。

arXiv:2507.05362v1 Announce Type: cross Abstract: Recent advances in natural language processing highlight two key factors for improving reasoning in large language models (LLMs): (i) allocating more test-time compute tends to help on harder problems but often introduces redundancy in the reasoning trace, and (ii) compute is most effective when reasoning is systematic and incremental, forming structured chains of thought (CoTs) akin to human problem-solving. To study these factors in isolation, we introduce a controlled setting based on shortest-path tasks in layered graphs. We train decoder-only transformers on question-trace-answer triples using a custom tokenizer, comparing models trained on optimal bottom-up dynamic programming traces with those trained on longer, valid traces involving backtracking. Surprisingly, with the same training-token budget, models trained on inefficient traces generalize better to unseen graphs. This benefit is not due to length alone-injecting arbitrary redundancy into reasoning traces fails to help and can even hurt performance. Instead, we find that generalization correlates with the model's confidence in next-token prediction, suggesting that long, coherent, and locally incremental traces make the training signal easier to optimize.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 推理优化结构化思维链冗余泛化能力

相关文章

Import AI 368: 500% faster local LLMs; 38X more efficient red teaming; AI21’s Frankenmodel

Learn AI Together — Towards AI Community Newsletter #23

This AI newsletter is all you need #98

Redundancy in AI: A Hybrid Convolutional Neural Networks CNN Approach to Minimize Computational Overhead in Reliable Execution

Patterns and Middleware for LLM Applications with Kyle Roche - #659

Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657

Mental Models for Advanced ChatGPT Prompting with Riley Goodside - #652

FastGen: Cutting GPU Memory Costs Without Compromising on LLM Quality

Intel Releases a Low-bit Quantized Open LLM Leaderboard for Evaluating Language Model Performance through 10 Key Benchmarks

Vidur: A Large-Scale Simulation Framework Revolutionizing LLM Deployment Through Cost Cuts and Increased Efficiency