RL_Fishai

热点

"RL" 相关文章

LLM抢人血案：强化学习天才被挖空，一朝沦为「无人区」

36kr 2025-08-04T07:24:36.000000Z

基模下半场：开源、人才、模型评估，今天的关键问题到底是什么？

智源社区 2025-08-02T03:14:24.000000Z

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

cs.AI updates on arXiv.org 2025-07-14T04:08:32.000000Z

slime: An SGLang-Native Post-Training Framework for RL Scaling

Large Model Systems Organization 2025-07-11T20:29:23.000000Z

DeepSeek-R1技术突破：纯RL训练竟能激发大模型"反思"能力？

掘金人工智能 2025-07-10T08:00:13.000000Z

When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

cs.AI updates on arXiv.org 2025-07-08T04:33:57.000000Z

Foom & Doom 2: Technical alignment is hard

少点错误 2025-06-23T17:22:35.000000Z

OpenAI路线遭质疑，Meta研究员：根本无法构建超级智能

虎嗅-AI 2025-06-23T02:33:52.000000Z

What I've been reading (#1)

Interconnects 2025-06-21T15:30:54.000000Z

OpenAI路线遭质疑！Meta研究员：根本无法构建超级智能

智源社区 2025-06-21T13:22:09.000000Z

Reinforcement learning and general intelligence

Artificial Fintelligence 2025-06-05T15:40:30.000000Z

一堂「强化学习」大师课 | 42章经

42章经 2025-05-14T18:11:35.000000Z

Agent 开发的上半场: 环境、Tools 和 Context 如何决定 Agent | 42章经

42章经 2025-05-13T18:26:38.000000Z

Open Source RL training landscape grows

Coding with Intelligence 2025-05-09T20:31:04.000000Z

Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

少点错误 2025-05-05T19:02:29.000000Z

Agent 开发的上半场: 环境、Tools 和 Context 如何决定 Agent | 42章经

42章经 2025-04-28T00:36:29.000000Z

一堂「强化学习」大师课 | 42章经

42章经 2025-04-13T18:41:20.000000Z

Kimi k1.5 背后的长长长长长思考

月之暗面 Kimi 2025-04-09T10:06:20.000000Z

从高等动物的学习过程展望 RL post-training之后的可能路径

孔某人的低维认知 2025-04-09T09:50:59.000000Z

小了 60,500 倍，但更强；AI 的“深度诅咒”

掘金人工智能 2025-04-01T11:32:47.000000Z

Copyright © 2019 FISHAI.All Rights Reserved