监督微调_Fishai

热点

"监督微调" 相关文章

Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning

cs.AI updates on arXiv.org 2025-07-25T04:28:46.000000Z

美团提出多模态推理新范式：RL+SFT非传统顺序组合突破传统训练瓶颈

智源社区 2025-07-22T13:53:23.000000Z

英伟达大牛主讲！斯坦福吴恩达：大语言模型的后训练课程全网发布

Datawhale 2025-07-20T08:43:56.000000Z

Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)

cs.AI updates on arXiv.org 2025-07-18T04:14:04.000000Z

Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them

cs.AI updates on arXiv.org 2025-07-16T04:28:50.000000Z

智源TALK｜数学推理能力提升与迁移，纽约大学

智源社区 2025-07-15T07:45:47.000000Z

5分钟带你搞懂从0打造一个ChatGPT

掘金人工智能 2025-07-14T10:38:58.000000Z

5分钟带你搞懂从0打造一个ChatGPT

掘金人工智能 2025-07-14T00:43:29.000000Z

Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools

cs.AI updates on arXiv.org 2025-07-09T04:01:37.000000Z

大模型刷数学题竟有害？CMU评估20+模型指出训练陷阱

量子位 2025-07-08T06:00:54.000000Z

极客说｜强化学习（RL）与有监督微调（SFT）的选择以及奖励函数的优化

掘金人工智能 2025-06-22T09:02:23.000000Z

SFT在帮倒忙？新研究：直接进行强化学习，模型多模态推理上限更高

机器之心 2025-06-01T08:01:31.000000Z

DeepSeek-R1发布100天后：全面复盘推理大模型复现研究及未来！

PaperAgent 2025-05-14T14:58:08.000000Z

「推理革命」爆发100天：DeepSeek-R1复现研究全揭秘！

智源社区 2025-05-07T00:48:00.000000Z

「推理革命」爆发 100 天：DeepSeek-R1 复现研究全揭秘！

掘金人工智能 2025-05-06T09:03:14.000000Z

中科大ICLR2025：特定领域仅用5%训练数据，知识准确率提升14%

量子位 2025-04-09T10:19:20.000000Z

中科大ICLR2025：特定领域仅用5%训练数据，知识准确率提升14%

智源社区 2025-04-08T08:24:07.000000Z

中科大ICLR2025：特定领域仅用5%训练数据，知识准确率提升14%

36kr 2025-04-07T09:27:41.000000Z

以 DeepSeek R1 为例学习“推理型大语言模型 [译]

宝玉的分享 2025-02-17T14:48:56.000000Z

This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning

MarkTechPost@AI 2025-02-11T06:58:02.000000Z

Copyright © 2019 FISHAI.All Rights Reserved