热点
关于我们
xx
xx
"
RL
" 相关文章
LLM抢人血案:强化学习天才被挖空,一朝沦为「无人区」
36kr
2025-08-04T07:24:36.000000Z
基模下半场:开源、人才、模型评估,今天的关键问题到底是什么?
智源社区
2025-08-02T03:14:24.000000Z
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
cs.AI updates on arXiv.org
2025-07-14T04:08:32.000000Z
slime: An SGLang-Native Post-Training Framework for RL Scaling
Large Model Systems Organization
2025-07-11T20:29:23.000000Z
DeepSeek-R1技术突破:纯RL训练竟能激发大模型"反思"能力?
掘金 人工智能
2025-07-10T08:00:13.000000Z
When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning
cs.AI updates on arXiv.org
2025-07-08T04:33:57.000000Z
Foom & Doom 2: Technical alignment is hard
少点错误
2025-06-23T17:22:35.000000Z
OpenAI路线遭质疑,Meta研究员:根本无法构建超级智能
虎嗅-AI
2025-06-23T02:33:52.000000Z
What I've been reading (#1)
Interconnects
2025-06-21T15:30:54.000000Z
OpenAI路线遭质疑!Meta研究员:根本无法构建超级智能
智源社区
2025-06-21T13:22:09.000000Z
Reinforcement learning and general intelligence
Artificial Fintelligence
2025-06-05T15:40:30.000000Z
一堂「强化学习」大师课 | 42章经
42章经
2025-05-14T18:11:35.000000Z
Agent 开发的上半场: 环境、Tools 和 Context 如何决定 Agent | 42章经
42章经
2025-05-13T18:26:38.000000Z
Open Source RL training landscape grows
Coding with Intelligence
2025-05-09T20:31:04.000000Z
Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
少点错误
2025-05-05T19:02:29.000000Z
Agent 开发的上半场: 环境、Tools 和 Context 如何决定 Agent | 42章经
42章经
2025-04-28T00:36:29.000000Z
一堂「强化学习」大师课 | 42章经
42章经
2025-04-13T18:41:20.000000Z
Kimi k1.5 背后的长长长长长思考
月之暗面 Kimi
2025-04-09T10:06:20.000000Z
从高等动物的学习过程展望 RL post-training之后的可能路径
孔某人的低维认知
2025-04-09T09:50:59.000000Z
小了 60,500 倍,但更强;AI 的“深度诅咒”
掘金 人工智能
2025-04-01T11:32:47.000000Z