cs.AI updates on arXiv.org 07月23日 12:03
From Reasoning to Super-Intelligence: A Search-Theoretic Perspective
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出Diligent Learner,一种新的学习范式,旨在解决CoT学习中存在的分布漂移、缺乏嵌入式搜索和指数级推理成本等问题,为构建可扩展、可靠的LRM提供新路径。

arXiv:2507.15865v1 Announce Type: new Abstract: Chain-of-Thought (CoT) reasoning has emerged as a powerful tool for enhancing the problem-solving capabilities of large language models (LLMs). However, the theoretical foundations of learning from CoT data remain underdeveloped, and existing approaches -- such as Supervised Fine-Tuning (SFT), Reinforcement Learning (RL), Tree-of-Thoughts (ToT), and Monte Carlo Tree Search (MCTS) -- often fail on complex reasoning tasks. In this work, we identify core obstacles that hinder effective CoT learning, including distribution drift, lack of embedded search, and exponential inference costs. We introduce the Diligent Learner, a new learning paradigm that explicitly models reasoning as a depth-first search guided by a validator and supports backtracking upon failure. Under two mild and realistic assumptions, we prove that the Diligent Learner can efficiently learn from CoT data while existing methods fail to do so. This framework offers a path toward building scalable and reliable reasoning systems trained on naturally occurring, incomplete data -- paving the way for the development of Large Reasoning Models (LRMs) with robust, interpretable problem-solving abilities.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Chain-of-Thought Diligent Learner LRM 推理系统 学习范式
相关文章