样本效率_Fishai

热点

"样本效率" 相关文章

Efficient Solution and Learning of Robust Factored MDPs

cs.AI updates on arXiv.org 2025-08-04T04:27:26.000000Z

Model Predictive Adversarial Imitation Learning for Planning from Observation

cs.AI updates on arXiv.org 2025-07-30T04:11:57.000000Z

Equivariant Volumetric Grasping

cs.AI updates on arXiv.org 2025-07-28T04:42:48.000000Z

Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

cs.AI updates on arXiv.org 2025-07-16T04:28:36.000000Z

EXPO: Stable Reinforcement Learning with Expressive Policies

cs.AI updates on arXiv.org 2025-07-11T04:04:20.000000Z

Reinforcement Learning with Action Chunking

cs.AI updates on arXiv.org 2025-07-11T04:04:19.000000Z

Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

cs.AI updates on arXiv.org 2025-07-08T05:54:04.000000Z

Causal-Paced Deep Reinforcement Learning

cs.AI updates on arXiv.org 2025-07-08T04:34:02.000000Z

16张H100训26分钟，超越o1-preview！李飞飞等用1K样本，揭秘测试时Scaling

新智元 2025-02-08T16:15:55.000000Z

16张H100训26分钟，超越o1-preview！李飞飞等用1K样本，揭秘测试时Scaling

智源社区 2025-02-07T09:53:24.000000Z

16张H100训26分钟，超越o1-preview，李飞飞等用1K样本，揭秘测试时Scaling

36氪 - 科技频道 2025-02-06T10:10:53.000000Z

训练1000样本就能超越o1，李飞飞等人画出AI扩展新曲线

机器之心 2025-02-05T07:40:09.000000Z

流式深度学习终于奏效了！强化学习之父Richard Sutton力荐

机器之心 2024-11-30T05:39:38.000000Z

科研成果合辑 | CoRL 2024顶会成果速览（上）

智源社区 2024-10-25T14:39:27.000000Z

Balancing Label Quantity and Quality for Scalable Elicitation

少点错误 2024-10-24T17:23:37.000000Z

How should we make trade-offs between the quantity and quality of labels used for eliciting knowledge from capable AI systems?

少点错误 2024-10-24T16:53:07.000000Z

Is Unchecked Churn Holding Back Your AI Performance? This AI Paper Unveils CHAIN: Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

MarkTechPost@AI 2024-09-19T06:05:34.000000Z

Scalable Multi-Agent Reinforcement Learning Framework for Efficient Decision-Making in Large-Scale Systems

MarkTechPost@AI 2024-09-07T08:20:14.000000Z

Scalable oversight as a quantitative rather than qualitative problem

少点错误 2024-07-06T17:50:10.000000Z

Relational, Object-Centric Agents for Completing Simulated Household Tasks with Wilka Carvalho - #402

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) 2024-05-12T03:32:25.000000Z

Copyright © 2019 FISHAI.All Rights Reserved