热点
"样本效率" 相关文章
Efficient Solution and Learning of Robust Factored MDPs
cs.AI updates on arXiv.org 2025-08-04T04:27:26.000000Z
Model Predictive Adversarial Imitation Learning for Planning from Observation
cs.AI updates on arXiv.org 2025-07-30T04:11:57.000000Z
Equivariant Volumetric Grasping
cs.AI updates on arXiv.org 2025-07-28T04:42:48.000000Z
Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
cs.AI updates on arXiv.org 2025-07-16T04:28:36.000000Z
EXPO: Stable Reinforcement Learning with Expressive Policies
cs.AI updates on arXiv.org 2025-07-11T04:04:20.000000Z
Reinforcement Learning with Action Chunking
cs.AI updates on arXiv.org 2025-07-11T04:04:19.000000Z
Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
cs.AI updates on arXiv.org 2025-07-08T05:54:04.000000Z
Causal-Paced Deep Reinforcement Learning
cs.AI updates on arXiv.org 2025-07-08T04:34:02.000000Z
16张H100训26分钟,超越o1-preview!李飞飞等用1K样本,揭秘测试时Scaling
新智元 2025-02-08T16:15:55.000000Z
16张H100训26分钟,超越o1-preview!李飞飞等用1K样本,揭秘测试时Scaling
智源社区 2025-02-07T09:53:24.000000Z
16张H100训26分钟,超越o1-preview,李飞飞等用1K样本,揭秘测试时Scaling
36氪 - 科技频道 2025-02-06T10:10:53.000000Z
训练1000样本就能超越o1,李飞飞等人画出AI扩展新曲线
机器之心 2025-02-05T07:40:09.000000Z
流式深度学习终于奏效了!强化学习之父Richard Sutton力荐
机器之心 2024-11-30T05:39:38.000000Z
科研成果合辑 | CoRL 2024顶会成果速览(上)
智源社区 2024-10-25T14:39:27.000000Z
Balancing Label Quantity and Quality for Scalable Elicitation
少点错误 2024-10-24T17:23:37.000000Z
How should we make trade-offs between the quantity and quality of labels used for eliciting knowledge from capable AI systems?
少点错误 2024-10-24T16:53:07.000000Z
Is Unchecked Churn Holding Back Your AI Performance? This AI Paper Unveils CHAIN: Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
MarkTechPost@AI 2024-09-19T06:05:34.000000Z
Scalable Multi-Agent Reinforcement Learning Framework for Efficient Decision-Making in Large-Scale Systems
MarkTechPost@AI 2024-09-07T08:20:14.000000Z
Scalable oversight as a quantitative rather than qualitative problem
少点错误 2024-07-06T17:50:10.000000Z
Relational, Object-Centric Agents for Completing Simulated Household Tasks with Wilka Carvalho - #402
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) 2024-05-12T03:32:25.000000Z