MarkTechPost@AI 03月05日
Beyond Monte Carlo Tree Search: Unleashing Implicit Chess Strategies with Discrete Diffusion
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了DIFFUSEARCH,一种基于离散扩散的框架,旨在解决大型语言模型在需要多步骤推理任务中的局限性。传统方法如蒙特卡洛树搜索计算成本高昂且易出错,而DIFFUSEARCH通过训练策略直接预测和利用未来表征,并使用扩散模型迭代优化预测,从而在降低计算开销的同时,提高长期规划的效率和准确性。实验结果表明,DIFFUSEARCH在国际象棋决策中超越了其他基线模型,展示了隐式搜索替代显式搜索的潜力,并为改进语言模型的下一令牌预测提供了新思路。

💡DIFFUSEARCH是一种创新的框架,通过离散扩散模型,克服了大型语言模型在复杂任务中长期规划的挑战,无需依赖高成本的显式搜索算法,如蒙特卡洛树搜索。

♟️该模型在国际象棋决策任务中进行了评估,通过监督学习,利用Stockfish作为预言机来标记棋盘状态。实验结果显示,DIFFUSEARCH在行动准确率和Elo等级分上均优于其他基线模型,证明了其有效性。

🚀DIFFUSEARCH采用了一种简化的行动-状态方法(s-asa),并使用自注意力机制和迭代去噪来逐步改进行动预测,避免了在推理过程中对未来状态进行昂贵的边缘化处理,从而提高了效率。

🎯 DIFFUSEARCH的成功不仅体现在国际象棋领域,也为改进语言模型的下一令牌预测提供了新的方向,为AI规划和决策中的隐式搜索研究奠定了基础。

Large language models (LLMs) generate text step by step, which limits their ability to plan for tasks requiring multiple reasoning steps, such as structured writing or problem-solving. This lack of long-term planning affects their coherence and decision-making in complex scenarios. Some approaches evaluate various alternatives before making a choice, which improves prediction precision. However, they have higher computational costs and are prone to errors if future forecasts were incorrect.

Apparent search algorithms like Monte Carlo Tree Search (MCTS) and beam search are well-liked in AI planning and decision-making but lack inherent limitations. They use repeated simulations of the future, with rising computation costs and rendering them unsuitable for real-time systems. They also depend on a value model to estimate every state, which, if incorrect, propagates the error along the search. Since longer predictions create more errors, these errors build up and decrease decision accuracy. This is particularly problematic in complicated tasks necessitating long-term planning, where it becomes challenging to maintain accurate foresight, resulting in inferior outcomes.

To mitigate these issues, researchers from The University of Hong Kong, Shanghai Jiaotong University, Huawei Noah’s Ark Lab, and Shanghai AI Laboratory proposed DIFFUSEARCH. This discrete diffusion-based framework eliminates explicit search algorithms like MCTS. Instead of relying on costly search processes, DIFFUSEARCH trains the policy to directly predict and utilize future representations, refining predictions iteratively using diffusion models. Integrating the world model and policy into a single framework reduces computational overhead while improving efficiency and accuracy in long-term planning.

The framework trains the model using supervised learning, leveraging Stockfish as an oracle to label board states from chess games. Different future representations are examined, with the action-state (s-asa) method selected for simplicity and efficiency. Rather than directly predicting future sequences, the model utilizes discrete diffusion modeling, applying self-attention and iterative denoising to improve action predictions gradually. DIFFUSEARCH avoids costly marginalization over future states during inference by directly sampling from the trained model. An easy-first decoding strategy prioritizes more predictable tokens for denoising, enhancing accuracy. 

Researchers evaluated DIFFUSEARCH against three transformer-based baselines: State-Action (S-A), State-Value (S-V), and Action-Value (SA-V) models trained using behavioral cloning, value-based decision-making, and legal action comparison, respectively. Using a dataset of 100k chess games, with states encoded in FEN format and actions in UCI notation, they implemented GPT-2-based models with an Adam optimizer, a 3e-4 learning rate, a batch size of 1024, an 8-layer architecture (7M parameters), a horizon of 4, and diffusion timesteps set to 20. Evaluations included action accuracy, puzzle accuracy, and Elo ratings from a 6000-game internal tournament. DIFFUSEARCH outperformed S-A by 653 Elo and 19% in action accuracy and exceeded SA-V despite using 20 times fewer data records. Discrete diffusion with linear λt achieved the highest accuracy (41.31%), surpassing autoregressive and Gaussian methods. DIFFUSEARCH retained predictive ability in future moves, though accuracy declined over steps, and performance improved with more attention layers and refined decoding. Positioned as an implicit search method, it demonstrated competitiveness with explicit MCTS-based approaches.

In summary, the proposed model established that implicit search via discrete diffusion could effectively replace explicit search and improve chess decision-making. The model surpassed searchless and explicit policies and showed its potential to learn future-imitative strategies. Although using an external oracle and a limited data set, the model indicated future possibilities for improvement through self-play and long-context modeling. More generally, this method can be applied to improve next-token prediction in language models. As a starting point for further investigation, it forms a basis for investigating implicit search in AI planning and decision-making.


Check out the Paper, and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post Beyond Monte Carlo Tree Search: Unleashing Implicit Chess Strategies with Discrete Diffusion appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DIFFUSEARCH 离散扩散 隐式搜索 AI决策 长期规划
相关文章
众合科技:与鸢飞科技签署战略合作协议,共同推进低空经济领域发展
很多朋友,都认为进电力,银行,是避险!有没有可能,进电力,银行等,就是纯粹看好未来呢?如果大家有研究整个市场经济,金融周期,就知道稳定,将会是未来三五...
消费降级的风终于吹到「金融消费」上了?
This AI Paper from UNC-Chapel Hill Introduces the System-1.x Planner: A Hybrid Framework for Efficient and Accurate Long-Horizon Planning with Language Models
【纪要】株冶集团(600961)交流纪要20240731
不满意现在的工作,但又不能马上辞职咋办? 近些年的职场环境是越来越卷,大公司有大公司病,小公司又很不稳定,还要面临各种职场向上管理等问题。我就看到过一...
人口30万,财政一般公共收入2个亿左右,工资6万一年的县委书记的联络员和沿海发达地区的,利润超过2亿的,有发展前途的上市公司纳米级股东,谁退休后(60岁)跟...
如果摈弃前嫌,中国足球全面和日本合作,各位jr是否能接受
我在一段时间里,想法都是固定的,现在不做短线了,我可能就盘看都不看了,有的人说一天波动挺大的,那就是仓位大了,你们自己控制好,这个票本身做的就是长票,...
很有实操性的建议,写的真好,必须转发一下【平和地聊聊发钱和消费】首先,我们坐下来深呼吸一下,喝一口水,做个眼保健操,然后开始聊。心态平和地聊。为什么要...