Language Models can Self-Improve at State-Value Estimation for Better Search

cs.AI updates on arXiv.org 前天 14:58

Language Models can Self-Improve at State-Value Estimation for Better Search

本文提出一种名为self-taught lookahead (STL)的自监督方法，利用状态转换动态提高价值模型，无需标注数据即可有效指导语言模型搜索，提升多步推理任务性能。

arXiv:2503.02878v2 Announce Type: replace-cross Abstract: Collecting ground-truth rewards or human demonstrations for multi-step reasoning tasks is often prohibitively expensive and time consuming, especially in interactive domains like web tasks. To address this bottleneck, we present self-taught lookahead (STL), a self-supervised method that leverages state-transition dynamics to improve a value model capable of effectively guiding language model-controlled search without any labeled data. We find that moderately sized (8 billion parameters) open-weight value models improved with STL can match the performance of using a gpt-4o value model. Furthermore, we find that specialized value models learned with STL can be deployed with computationally lightweight search algorithms, achieving performance that matches that of more expensive tree search methods, while reducing costs by an order of magnitude.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

STL 自监督学习多步推理价值模型语言模型

相关文章

Coalition of news publishers sue Microsoft and OpenAI

This AI Paper by Microsoft and Tsinghua University Introduces YOCO: A Decoder-Decoder Architectures for Language Models

OLMo: Everything You Need to Train an Open Source LLM with Akshita Bhagia - #674

Multilingual LLMs and the Values Divide in AI with Sara Hooker - #651

BloombergGPT - an LLM for Finance with David Rosenberg - #639

AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612

Scaling BERT and GPT for Financial Services with Jennifer Glore - #561

Trends in Deep Reinforcement Learning with Kamyar Azizzadenesheli - #560

Using Brain Imaging to Improve Neural Networks with Alona Fyshe - #513

Can Language Models Be Too Big? ? with Emily Bender and Margaret Mitchell - #467