MarkTechPost@AI 04月16日 15:15
MIT Researchers Introduce DISCIPL: A Self-Steering Framework Using Planner and Follower Language Models for Efficient Constrained Generation and Reasoning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

MIT和耶鲁大学的研究人员开发了一种名为DISCIPL的新型自导向语言模型框架,旨在提高语言模型在受限生成和推理任务中的效率。DISCIPL通过将规划和执行分离,让模型能够生成推理代码(Planner)来指导推理过程,并使用多个Follower模型并行执行该代码。这种方法在COLLIE基准测试中取得了显著的性能提升,并在实际任务中展现出优异表现,为小型语言模型提供了通过智能编排和自导向推理超越其规模的途径。

💡 现有语言模型在需要严格遵守约束的任务中表现不佳,例如需要特定字数、关键词位置或主题限制的生成任务。这些模型在追求流畅性的同时,难以保证输出内容完全符合预定义的规则。

⚙️ DISCIPL框架包含两个主要组成部分:Planner语言模型和Follower模型。Planner负责生成定制的推理程序,而Follower模型则执行该程序以解决任务。这种架构允许动态和自适应的计算策略,从而针对每个任务进行优化。

💻 DISCIPL使用一种名为LLAMPPL的Python编程框架来生成推理代码。Planner编写代码定义如何探索可能的解决方案,Follower模型则运行代码以搜索有效的输出。这种方法支持多种推理技术,包括重要性采样、顺序蒙特卡罗(SMC)和拒绝采样,从而提高精度和效率。

📈 在COLLIE基准测试中,配备SMC的DISCIPL框架将Follower模型Llama-3.2-1B的Pass@1成功率从4%提高到87%,在一些情况下甚至超越了GPT-4o-mini。在更复杂的PUZZLES任务中,DISCIPL也 consistently outperformed Planner和Follower的单独表现。

Language models predict sequences of words based on vast datasets and are increasingly expected to reason and perform complex linguistic manipulations. Yet, despite their growing sophistication, even powerful models often falter when assigned problems that require step-by-step logic, especially those bound by explicit constraints or structured problem-solving, highlighting their current limitations in applied reasoning.

The difficulty arises in generating language that strictly adheres to given conditions. Tasks might specify exact word counts, position of keywords, or thematic constraints, all of which are challenging for models prioritizing probability-based fluency. For example, models often fail to construct a coherent sentence while embedding words at particular locations or composing paragraphs under multiple concurrent requirements. The challenge isn’t just generating relevant content but generating content that rigidly fits a set of formal, predefined rules without compromising fluency.

Currently, methods like chain-of-thought prompting attempt to guide models through a reasoning path, but these are limited by their serial execution and expensive inference costs. Parallel approaches such as guess-and-check or best-of-N sampling rely on generating and filtering multiple candidates. Yet, they need separate scoring mechanisms and often yield inconsistent results. These tools improve performance slightly but cannot guarantee the satisfaction of all constraints, especially when models lack an inherent understanding of those constraints.

Researchers from MIT and Yale introduced a novel approach named DISCIPL, designed to enable what they term “self-steering” language models. This method defines two roles: a Planner language model, which generates a tailored inference program, and a population of Follower models that execute this program to solve the task. Unlike previous systems, the Planner creates a logic that structures the reasoning process. By separating the planning from execution, the method allows for dynamic and adaptive computation strategies tailored to each task.

The inner workings of DISCIPL involve generating inference code using a language called LLAMPPL, which is a Python-based framework for probabilistic programming with language models. The Planner writes code that defines how to explore possible solutions, while Follower models run the code to search for valid outputs. These programs operate by iteratively proposing partial solutions and scoring them based on constraints. The architecture supports multiple inference techniques, including importance sampling, sequential Monte Carlo (SMC), and rejection sampling, which are scalable based on computational budgets. This structured decomposition lets the system reallocate resources to more promising candidates during execution, improving precision and efficiency.

In performance evaluations, DISCIPL proved remarkably effective. On the COLLIE benchmark for constrained sentence generation, the Follower model Llama-3.2-1B alone achieved only 4% Pass@1 success. When enhanced with DISCIPL and SMC, performance rose to 87%, surpassing GPT-4o-mini in some instances. The same setup scored as high as 88% Pass@1 for paragraph-level tasks. On a set of difficult real-world tasks called PUZZLES, covering grant writing and itinerary planning, DISCIPL consistently outperformed both the Planner and Follower operating alone. The method also demonstrated high coherency, with average scores around 7.45 out of 10 when using SMC, which starkly contrasts the 9+ scores from more fluent but incorrect outputs produced by baseline methods.

Overall, the work introduces a fresh direction in language modeling where models generate answers and devise how they should be computed. By letting the Planner generate code that structures reasoning and Followers execute this code in parallel, the method achieves precision, adaptability, and fluency without requiring larger models or manual engineering. The research’s results illustrate a clear path for enabling smaller language models to outperform their size through intelligent orchestration and self-guided inference.


Here is the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post MIT Researchers Introduce DISCIPL: A Self-Steering Framework Using Planner and Follower Language Models for Efficient Constrained Generation and Reasoning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DISCIPL 语言模型 推理框架 人工智能
相关文章