cs.AI updates on arXiv.org 13小时前
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出一种将数据点视为相互关联,并使用马尔可夫奖励过程进行数据建模的统计学习方法。通过将监督学习转化为强化学习中的策略评估问题,并引入广义时间差分学习算法,证明了其在特定条件下的有效性,并通过实证研究验证了算法的实用性。

arXiv:2404.15518v4 Announce Type: replace-cross Abstract: In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution. Theoretically, our analysis establishes connections between the solutions of linear TD learning and ordinary least squares (OLS). Under specific conditions -- particularly when the noise is correlated -- the TD solution serves as a more effective estimator than OLS. Furthermore, we show that when our algorithm is applied with many commonly used loss functions -- such as those found in generalized linear models -- it corresponds to the application of a novel and generalized Bellman operator. We prove that this operator admits a unique fixed point, and based on this, we establish convergence guarantees for our generalized TD algorithm under linear function approximation. Empirical studies verify our theoretical results, examine the vital design of our TD algorithm and show practical utility across various datasets, encompassing tasks such as regression and image classification with deep learning.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

统计学习 马尔可夫奖励过程 强化学习 时间差分学习 数据建模
相关文章