热点
"策略梯度" 相关文章
Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models
MarkTechPost@AI 2025-06-02T04:56:04.000000Z
微软副总裁X上「开课」,连更关于RL的一切,LLM从业者必读
机器之心 2025-05-26T07:35:33.000000Z
Policy Gradient Algorithms
Lil'Log 2024-11-09T05:43:41.000000Z
基于策略梯度(Policy Gradient)来序贯决策(sequential decision making)任务
掘金 人工智能 2024-07-05T09:16:30.000000Z