热点
关于我们
xx
xx
"
训练效率
" 相关文章
万卡集群真实部署,已节省数百万 GPU 小时!MoE 通信优化技术 COMET 开源
字节跳动技术团队
2025-03-25T12:00:59.000000Z
Moonshot AI and UCLA Researchers Release Moonlight: A 3B/16B-Parameter Mixture-of-Expert (MoE) Model Trained with 5.7T Tokens Using Muon Optimizer
MarkTechPost@AI
2025-02-23T04:50:15.000000Z
出人意料!DeepSeek-R1用的GRPO其实非最优?规模化强化学习训练用PPO就够了
机器之心
2025-02-21T05:49:07.000000Z
DeepSeek新论文再次引发热议,它说了什么?
虎嗅
2025-02-19T08:18:29.000000Z
Max Tegmark组新工作:利用调和损失训练可解释的AI模型
集智俱乐部
2025-02-13T15:38:04.000000Z
Process Reinforcement through Implicit Rewards (PRIME): A Scalable Machine Learning Framework for Enhancing Reasoning Capabilities
MarkTechPost@AI
2025-02-08T03:49:59.000000Z
谢赛宁新作:表征学习有多重要?一个操作刷新SOTA,DiT训练速度暴涨18倍
硅星人Pro
2024-10-29T00:26:49.000000Z
关于 RWKV 架构的一些谣言
RWKV元始智能
2024-10-28T00:09:59.000000Z
谢赛宁新作:表征学习有多重要?一个操作刷新SOTA,DiT训练速度暴涨18倍
智源社区
2024-10-24T03:23:48.000000Z
Nvidia AI Introduces the Normalized Transformer (nGPT): A Hypersphere-based Transformer Achieving 4-20x Faster Training and Improved Stability for LLMs
MarkTechPost@AI
2024-10-19T22:20:49.000000Z
Revisiting Recurrent Neural Networks RNNs: Minimal LSTMs and GRUs for Efficient Parallel Training
MarkTechPost@AI
2024-10-07T23:21:32.000000Z
腾讯推出新一代大模型“混元Turbo”:性能大幅提升,定价低50%
36氪
2024-09-05T03:45:41.000000Z
Sparse Maximal Update Parameterization (SμPar): Optimizing Sparse Neural Networks for Superior Training Dynamics and Efficiency
MarkTechPost@AI
2024-06-04T06:01:02.000000Z