MarkTechPost@AI 前天 14:00
Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

新加坡国立大学的研究人员推出了一种名为Thinkless的新框架,旨在使语言模型能够动态决定使用简短或详细的推理。该框架基于强化学习,引入了两个特殊的控制tokens——<short>表示简洁的答案,<think>表示详细的回复。通过一种名为解耦组相对策略优化(DeGRPO)的新算法,Thinkless将训练重点分离在选择推理模式和提高生成响应的准确性之间。这种设计防止了模型陷入单一维度行为,并能够根据每个查询定制自适应推理。实验表明,Thinkless在保持高准确率的同时,显著减少了长篇推理的使用。

💡Thinkless框架通过引入<short>和<think>两个控制tokens,使语言模型能够根据任务复杂度动态选择简短或详细的推理模式,从而优化推理过程。

👨‍🏫Thinkless采用了一种名为DeGRPO的创新算法,将训练过程分解为两个独立的目标:一个是训练控制token,另一个是优化响应token,从而避免了梯度不平衡问题,并促进了不同响应类型之间的稳定学习。

📊在Minerva Algebra基准测试中,Thinkless仅在25.88%的情况下使用<think>token,同时达到了94.59%的准确率。在GSM8K数据集上,<think>的使用率仅为13.31%,准确率仍达到84.18%,表明该模型能够根据查询的复杂性调整推理深度。

🚀Thinkless通过减少不必要的token生成,在某些任务中最多可减少90%的推理开销,从而显著提高了语言模型的效率。

The effectiveness of language models relies on their ability to simulate human-like step-by-step deduction. However, these reasoning sequences are resource-intensive and can be wasteful for simple questions that do not require elaborate computation. This lack of awareness regarding the complexity of the task is one of the core challenges in these models. They often default to detailed reasoning even for queries that could be answered directly. Such an approach increases token usage, extends response time, and increases system latency and memory usage. As a result, there’s a pressing need to equip language models with a mechanism that allows them to make autonomous decisions about whether to think deeply or respond succinctly.

Current tools attempting to solve this issue either rely on manually set heuristics or prompt engineering to switch between short and long responses. Some methods use separate models and route questions based on complexity estimates. Still, these external routing systems often lack insight into the target model’s strengths and fail to make optimal decisions. Other techniques fine-tune models with prompt-based cues like “reasoning on/off,” but these rely on static rules rather than dynamic understanding. Despite some improvements, these approaches fail to enable fully autonomous and context-sensitive control within a single model.

Researchers from the National University of Singapore introduced a new framework called Thinkless, which equips a language model with the ability to dynamically decide between using short or long-form reasoning. The framework is built on reinforcement learning and introduces two special control tokens—<short> for concise answers and <think> for detailed responses. By incorporating a novel algorithm called Decoupled Group Relative Policy Optimization (DeGRPO), Thinkless separates the training focus between selecting the reasoning mode and improving the accuracy of the generated response. This design prevents the model from falling into one-dimensional behavior and enables adaptive reasoning tailored to each query.

The methodology involves two stages: warm-up distillation and reinforcement learning. In the distillation phase, Thinkless is trained using outputs from two expert models—one specializing in short responses and the other in detailed reasoning. This stage helps the model establish a firm link between the control token and the desired reasoning format. The reinforcement learning stage then fine-tunes the model’s ability to decide which reasoning mode to use. DeGRPO decomposes the learning into two separate objectives: one for training the control token and another for refining the response tokens. This approach avoids the gradient imbalances in earlier models, where longer responses would overpower the learning signal, leading to a collapse in reasoning diversity. Thinkless ensures that both <short> and <think> tokens receive balanced updates, promoting stable learning across response types.

When evaluated, Thinkless significantly reduced long-form reasoning while preserving high accuracy. On the Minerva Algebra benchmark, the model used the <think> token in only 25.88% of cases while achieving 94.59% accuracy. In contrast, conventional reasoning models had to use extended chains of thought much more frequently. On the AIME 2024 dataset, Thinkless reached a 27.33% accuracy rate with 100% usage of the reasoning mode, showing that it could maintain performance when full reasoning was necessary. On the GSM8K dataset, it utilized <think> only 13.31% of the time, yet still achieved 84.18% accuracy. These results reflect the model’s ability to handle simple and complex queries with appropriate reasoning depth, cutting down on unnecessary token generation by as much as 90% in some tasks.

Overall, this study from the National University of Singapore researchers presents a compelling solution to the inefficiencies of uniform reasoning in large language models. By introducing a mechanism that enables models to judge task complexity and adjust their inference strategy accordingly, Thinkless optimizes both accuracy and efficiency. The method balances depth of reasoning and response precision without relying on fixed rules, offering a data-driven approach to more intelligent language model behavior.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

The post Researchers from the National University of Singapore Introduce ‘Thinkless,’ an Adaptive Framework that Reduces Unnecessary Reasoning by up to 90% Using DeGRPO appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Thinkless 语言模型 强化学习 自适应推理 DeGRPO
相关文章