cs.AI updates on arXiv.org 07月03日
On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出一种基于近端策略优化(PPO)的强化学习(RL)方法,用于训练神经模糊控制器。在CartPole-v1环境中,与ANFIS-DQN相比,PPO训练的模糊控制器表现更优,展现出更低的方差和更快的学习收敛速度。

arXiv:2507.01039v1 Announce Type: cross Abstract: We propose a reinforcement learning (RL) approach for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Building on prior work that applied Deep Q-Learning to Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our method replaces the off-policy value-based framework with a stable on-policy actor-critic loop. We evaluate this approach in the CartPole-v1 environment using multiple random seeds and compare its learning performance against ANFIS-Deep Q-Network (DQN) baselines. It was found that PPO-trained fuzzy agents achieved a mean return of 500 +/- 0 on CartPole-v1 after 20000 updates, showcasing less variance than prior DQN-based methods during training and overall faster convergence. These findings suggest that PPO offers a promising pathway for training explainable neuro-fuzzy controllers in reinforcement learning tasks.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

强化学习 神经模糊控制器 近端策略优化 CartPole-v1 DQN
相关文章