热点
"监督微调" 相关文章
SFT在帮倒忙?新研究:直接进行强化学习,模型多模态推理上限更高
机器之心 2025-06-01T08:01:31.000000Z
DeepSeek-R1发布100天后:全面复盘推理大模型复现研究及未来!
PaperAgent 2025-05-14T14:58:08.000000Z
「推理革命」爆发100天:DeepSeek-R1复现研究全揭秘!
智源社区 2025-05-07T00:48:00.000000Z
「推理革命」爆发 100 天:DeepSeek-R1 复现研究全揭秘!
掘金 人工智能 2025-05-06T09:03:14.000000Z
中科大ICLR2025:特定领域仅用5%训练数据,知识准确率提升14%
量子位 2025-04-09T10:19:20.000000Z
中科大ICLR2025:特定领域仅用5%训练数据,知识准确率提升14%
智源社区 2025-04-08T08:24:07.000000Z
中科大ICLR2025:特定领域仅用5%训练数据,知识准确率提升14%
36kr 2025-04-07T09:27:41.000000Z
以 DeepSeek R1 为例学习“推理型大语言模型 [译]
宝玉的分享 2025-02-17T14:48:56.000000Z
This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning
MarkTechPost@AI 2025-02-11T06:58:02.000000Z
SFT并非必需!推理模型仅靠RL就能获得长思维链能力,清华CMU团队破解黑盒
智源社区 2025-02-10T05:07:14.000000Z
Sebastian Raschka:关于DeepSeek R1和推理模型,我有几点看法
机器之心 2025-02-09T07:54:09.000000Z
Unraveling Direct Alignment Algorithms: A Comparative Study on Optimization Strategies for LLM Alignment
MarkTechPost@AI 2025-02-08T03:49:40.000000Z
Researchers created an open rival to OpenAI’s o1 ‘reasoning’ model for under $50
TechCrunch News 2025-02-05T23:42:27.000000Z
Memorization vs. Generalization: How Supervised Fine-Tuning SFT and Reinforcement Learning RL Shape Foundation Model Learning
MarkTechPost@AI 2025-01-31T21:04:55.000000Z
一文详尽之SFT(监督微调)!
智源社区 2025-01-25T13:38:11.000000Z
From Wordle to Robotics: Q-SFT Unleashes LLMs’ Potential in Sequential Decision-Making
MarkTechPost@AI 2024-12-02T07:08:46.000000Z
Aquila-Med LLM:开创性的全流程开源医疗语言模型
智源研究院 2024-10-24T17:00:57.000000Z
Tele-FLM系列再升级!52B对话模型发布、全球首个万亿单体稠密模型开源
智源研究院 2024-10-24T17:00:57.000000Z
Magpie-Ultra Dataset Released: Harnessing Llama 3.1 405B for Diverse AI Instruction-Response Pairs
MarkTechPost@AI 2024-08-04T15:19:49.000000Z