MarkTechPost@AI 7小时前
Fractional Reasoning in LLMs: A New Way to Control Inference Depth
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

斯坦福大学的研究人员提出了一种名为Fractional Reasoning (FR) 的新框架,旨在通过自适应推理控制来改进LLMs的测试时计算。FR无需训练,也与模型无关,通过调整模型的内部表示来改变推理行为,从而实现对推理深度的灵活控制。实验结果表明,FR在GSM8K、MATH500和GPQA等推理任务上均优于现有的测试时计算方法,为LLM推理提供了更高效、更精确的解决方案。

🧠 Fractional Reasoning (FR) 是一种无需训练、与模型无关的框架,它通过调整LLM的内部表示来控制推理行为。

💡 FR的核心在于提取由CoT或反射提示等推理促进输入引起的潜在变化,并使用可调缩放因子再次应用此变化。这使得模型无需修改输入文本或进行微调,即可在推理过程中调整推理深度。

📈 FR在GSM8K、MATH500和GPQA三个基准测试中表现出色,优于Best-of-N和Majority Vote等现有测试时计算方法,证明了其在提高LLM推理性能方面的有效性。

🔍 FR的分析表明,增加缩放参数会导致更长的输出,包含更详细的多步推理,验证了该框架对模型行为的可预测和持续控制。FR在各种模型中均有效,包括DeepSeek-R1-Distill-Qwen-7B,展示了其在通用和专用LLM上的通用性。

What is included in this article:
The limitations of current test-time compute strategies in LLMs.
Introduction of Fractional Reasoning (FR) as a training-free, model-agnostic framework.
Techniques for latent state manipulation using reasoning prompts and adjustable scaling.
Breadth- and depth-based scaling benefits demonstrated across GSM8K, MATH500, and GPQA.
Evaluation results showing FR’s superiority over Best-of-N and Majority Vote.
Analysis of FR’s behavior across different models, including DeepSeek-R1.

Introduction: Challenges in Uniform Reasoning During Inference

LLMs have shown improvements in various domains, with test-time compute playing a crucial role in their performance. This approach enhances reasoning during inference by allocating extra computational resources, such as generating multiple candidate responses and selecting the most suitable one, or refining answers iteratively through self-reflection. However, current test-time compute strategies treat all problems uniformly, applying the same depth of reasoning regardless of query difficulty or structure. In reality, reasoning needs are highly variable, and reasoning with under-, overthinking, or reflection can lead to degraded answers or unnecessary computational costs. Therefore, LLMs must be capable of adjusting their reasoning depth or level of reflection dynamically.

Prior Work: Latent Steering and Representation Control

Existing research has explored various methods to enhance LLM reasoning through inference-time scaling and latent state control. The Chain-of-Thought (CoT) prompting technique guides models to decompose complex problems into intermediate steps to improve reasoning performance. Outcome reward models (ORMs) and process reward models (PRMs) evaluate generated responses based on correctness or quality of internal reasoning. Moreover, representation engineering methods use steering vectors in LLM latent spaces for controlled generation, while methods like In-Context Vectors (ICV) extract latent vectors from demonstrations to steer internal states at inference time, and Representation Finetuning (ReFT) learns task-specific low-rank interventions over latent representations.

The Proposed Framework: Fractional Reasoning for Adaptive Inference

Researchers from Stanford University have proposed Fractional Reasoning (FR), a training-free and model-agnostic framework for improving test-time compute through adaptive reasoning control. FR adjusts reasoning behavior by directly modifying the model’s internal representations, extracting the latent shift induced by reasoning-promoting inputs such as CoT or reflection prompts, and again applying this shift with a tunable scaling factor. This enables models to adjust the depth of reasoning during inference without modifying the input text or requiring fine-tuning. FR supports and enhances two key forms of test-time scaling: (a) Breadth-based scaling, like Best-of-N and Majority vote, and (b) Depth-based scaling, like self-reflection.

Benchmarking: Performance Gains on Reasoning Tasks

FR is evaluated on three benchmarks that require multi-step reasoning: GSM8K, MATH500, and GPQA. The evaluation utilizes test sets for GSM8K and MATH500 while using the diamond split for GPQA. Main experiments use two competitive open-source instruction-tuned models: Qwen2.5-7B-Instruct and LLaMA-3.1-8B-Instruct, both of which demonstrate strong reasoning capabilities and provide access to the latent state representations required by the proposed method. FR outperforms standard test-time compute methods on all benchmarks and models, showing that it can strongly enhance performance. Adjusting the influence of prompts enables broader exploration of the solution space, increasing the efficiency of traditional test-time compute methods.

Behavior and Model-Agnostic Generality of Fractional Reasoning

Researchers further analyzed FR to understand its behavioral dynamics, generality across models, and other metrics. Analysis reveals that increasing the scaling parameter leads to longer outputs with more detailed multi-step reasoning, confirming the framework steers model behavior predictably and continuously. FR remains effective even when applied to reasoning-specialized models such as DeepSeek-R1-Distill-Qwen-7B, improving accuracy over standard prompting baselines and showing its generality across both general-purpose and specialized LLMs. Performance scaling analysis shows consistent improvements with an increasing number of generations, and FR shows higher accuracy across most sampling budgets compared to the majority vote baseline.

Conclusion: Towards More Dynamic and Efficient LLM Inference

In conclusion, researchers from Stanford University introduced Fractional Reasoning (FR), a training-free and model-agnostic framework that improves test-time compute through adaptive control of reasoning behavior in LLMs. It offers a general and interpretable approach for more precise and efficient allocation of computational effort during inference, overcoming the limitation of uniform reasoning application in current test-time compute strategies. However, the framework currently depends on predefined reasoning directions and lacks automatic selection of scaling factors, indicating future research directions toward adaptive policies for fully dynamic inference.

Check out the Paper. All credit for this research goes to the researchers of this project. Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Research, and top AI companies leverage MarkTechPost to reach their target audience [Learn More]

The post Fractional Reasoning in LLMs: A New Way to Control Inference Depth appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Fractional Reasoning LLM推理 测试时计算 自适应推理
相关文章