MarkTechPost@AI 2024年08月13日
Self-play muTuAl Reasoning (rStar): A Novel AI Approach that Boosts Small Language Models SLMs’ Reasoning Capability during Inference without Fine-Tuning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

rStar 是一种新颖的 AI 方法,通过自博弈互推推理过程,在推理过程中增强了小语言模型 (SLM) 的推理能力,而无需进行微调或依赖更强大的模型。rStar 利用蒙特卡洛树搜索 (MCTS) 来自动生成推理步骤,并引入了一种称为“互一致性”的鉴别过程,使用第二个 SLM 来评估生成的推理轨迹,从而有效地指导探索。rStar 在各种推理基准测试和语言模型上都取得了显著的成果,其性能优于现有的方法,并展现出在推理任务中的潜力。

🤔 **rStar 的核心机制:**rStar 使用蒙特卡洛树搜索 (MCTS) 来自动生成推理步骤,并引入了一种称为“互一致性”的鉴别过程,使用第二个 SLM 来评估生成的推理轨迹,从而有效地指导探索。

💡 **rStar 的优势:**rStar 能够显著提高 SLM 的推理能力,在 GSM8K 数据集上,LLaMA2-7B 的准确率从使用少样本 CoT 的 12.51% 提升到了使用 rStar 的 63.91%,几乎与微调后的性能相当。rStar 在各种推理基准测试和语言模型上都取得了显著的成果,其性能优于现有的方法,并展现出在推理任务中的潜力。

💪 **rStar 的应用场景:**rStar 可应用于各种需要推理能力的任务,例如问答、文本生成、代码生成等。rStar 的出现为提升 SLM 的推理能力提供了新的思路,也为 AI 在推理任务中的应用带来了新的可能性。

🚀 **rStar 的未来展望:**rStar 的研究成果为提升 SLM 的推理能力提供了新的思路,未来可以进一步探索 rStar 在不同领域和不同任务中的应用,以及如何进一步提高 rStar 的效率和性能。

📊 **rStar 的实验结果:**rStar 在各种推理基准测试和语言模型上都取得了显著的成果,其性能优于现有的方法,例如在 GSM8K 数据集上,LLaMA2-7B 的准确率从使用少样本 CoT 的 12.51% 提升到了使用 rStar 的 63.91%,几乎与微调后的性能相当。rStar 在 GSM-Hard 和 MATH-500 等具有挑战性的数学数据集上也取得了显著的提升,表明其在解决复杂推理问题方面的潜力。

Large language models (LLMs) have made significant strides in various applications, but they continue to face substantial challenges in complex reasoning tasks. For instance, even advanced models like Mistral-7B can only achieve 36.5% accuracy on the GSM8K dataset, despite employing techniques such as Chain-of-Thought (CoT). While fine-tuning has shown promise in improving reasoning capabilities, most LLMs rely on data distilled or synthesized by superior models like GPT-4. This dependency on more advanced models has led researchers to explore alternative approaches to enhance reasoning without relying on a superior teacher LLM. However, this endeavour presents its challenges, particularly for smaller language models (SLMs), which need help with effective solution space exploration and quality assessment of reasoning steps.

Researchers have made various attempts to enhance the reasoning capabilities of language models. Prompting-based methods, such as Chain-of-Thought, focus on designing instructions and pipelines to improve performance during inference. These approaches include planning, problem decomposition, abstraction, and programming techniques. Also, self-improvement methods have gained traction, with fine-tuning approaches utilizing pre-trained LLMs to synthesise data and enhance performance progressively. Advanced prompting techniques like self-verification and RAP aim to improve performance through iterative self-exploration. Sampling diverse reasoning paths has shown promise in mathematical reasoning tasks, with methods like Self-Consistency and tree-search approaches breaking down tasks into simpler steps. For answer verification, majority voting is widely used, while some researchers have explored training value or reward models, though these require additional annotations and risk overfitting.

Researchers from Microsoft Research Asia and Harvard University introduced the Self-play muTuAl Reasoning (rStar) approach, a robust solution to enhance SLMs reasoning capabilities during inference, without relying on fine-tuning or superior models. rStar tackles the challenges faced by SLMs through a unique self-play mutual generation-discrimination process. This method employs a conventional Monte Carlo Tree Search (MCTS) for self-generating reasoning steps but expands the set of reasoning actions to simulate human reasoning behaviors. These actions include decomposing problems, searching for specific reasoning steps, proposing new sub-questions, and rephrasing given questions. To guide the exploration of generated reasoning trajectories effectively, rStar introduces a discrimination process called mutual consistency, which employs a second SLM as a discriminator to provide unsupervised feedback on candidate reasoning trajectories.

The rStar approach employs a unique architecture to enhance SLMs reasoning capabilities. At its core, rStar uses an MCTS algorithm to augment the target SLM for self-generating multi-step reasoning solutions. The method introduces a rich set of five human-like reasoning actions, including proposing one-step thoughts, generating remaining thought steps, proposing and answering sub-questions, re-answering sub-questions, and rephrasing questions. This diverse action space allows for thorough exploration across various reasoning tasks.

rStar implements a carefully designed reward function that evaluates each action’s value without relying on self-rewarding techniques or external supervision. The MCTS rollout process uses the Upper Confidence Bounds applied to Trees (UCT) algorithm to balance exploration and exploitation during tree expansion. To verify the generated reasoning trajectories, rStar introduces a second SLM as a discriminator, employing a mutual consistency approach. This process involves masking part of a candidate trajectory and asking the discriminator SLM to complete it, then comparing the results for consistency.

The results demonstrate the effectiveness of rStar across various reasoning benchmarks and language models:

1. Performance on diverse reasoning tasks:

2. Efficiency:

3. Performance on challenging mathematical datasets:

4. Ablation studies:

5. Model comparisons:

These results highlight rStar’s effectiveness in enhancing SLMs’ reasoning capabilities across various tasks and models, outperforming existing methods in both accuracy and efficiency.

The rStar approach introduces a robust generator-discriminator self-play method that significantly enhances the reasoning capabilities of SLMs during inference. This research reveals that SLMs like LLaMA2-7B possess strong inherent reasoning abilities even before domain-specific supervised fine-tuning. rStar demonstrates state-of-the-art performance across five different SLMs and five diverse reasoning tasks, substantially outperforming existing multi-round prompting and self-improvement techniques. The extensive ablation studies and analysis conducted in this research contribute valuable insights to the field, paving the way for more advanced self-improved reasoning techniques in SLMs. These findings highlight the potential of rStar in unlocking the latent reasoning capabilities of language models without the need for extensive fine-tuning or reliance on larger models.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here


The post Self-play muTuAl Reasoning (rStar): A Novel AI Approach that Boosts Small Language Models SLMs’ Reasoning Capability during Inference without Fine-Tuning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

rStar 推理 小语言模型 SLM 蒙特卡洛树搜索 MCTS 互一致性
相关文章