热点
关于我们
xx
xx
"
策略梯度
" 相关文章
Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in Large Language Models
MarkTechPost@AI
2025-06-02T04:56:04.000000Z
微软副总裁X上「开课」,连更关于RL的一切,LLM从业者必读
机器之心
2025-05-26T07:35:33.000000Z
Policy Gradient Algorithms
Lil'Log
2024-11-09T05:43:41.000000Z
基于策略梯度(Policy Gradient)来序贯决策(sequential decision making)任务
掘金 人工智能
2024-07-05T09:16:30.000000Z