热点
关于我们
xx
xx
"
混合专家
" 相关文章
单卡即可微调大模型!内存占用仅1/8,性能依然拉满 | ICML 2025
智源社区
2025-05-29T02:27:55.000000Z
原生多模态模型的标度律:重新思考架构选择与训练效率
集智俱乐部
2025-05-14T14:27:36.000000Z
原生多模态模型的标度律:重新思考架构选择与训练效率
集智俱乐部
2025-05-13T14:32:41.000000Z
苹果提出原生多模态Scaling Law!早融合+MoE,性能飙升秘密武器
智源社区
2025-05-06T07:43:06.000000Z
探秘Transformer系列之(21)--- MoE
掘金 人工智能
2025-03-31T13:19:45.000000Z
SYMBOLIC-MOE: Mixture-of-Experts MoE Framework for Adaptive Instance-Level Mixing of Pre-Trained LLM Experts
MarkTechPost@AI
2025-03-16T03:47:16.000000Z
ByteDance AI Introduces Doubao-1.5-Pro Language Model with a ‘Deep Thinking’ Mode and Matches GPT 4o and Claude 3.5 Sonnet Benchmarks at 50x Cheaper
MarkTechPost@AI
2025-01-26T03:58:44.000000Z
Monet: Mixture of Monosemantic Experts for Transformers Explained
少点错误
2025-01-25T21:37:00.000000Z
Mixture-of-Denoising Experts (MoDE): A Novel Generalist MoE-based Diffusion Policy
MarkTechPost@AI
2025-01-03T04:29:47.000000Z
DeekSeek v3: The Six Million Dollar Model
少点错误
2024-12-31T15:13:22.000000Z
50张图,直观理解混合专家(MoE)大模型
OneFlow
2024-11-29T11:14:28.000000Z
Jurgen、曼宁等大佬新作:MoE重塑6年前的Universal Transformer,高效升级
机器之心
2024-10-19T08:11:44.000000Z
XVERSE-MoE-A36B Released by XVERSE Technology: A Revolutionary Multilingual AI Model Setting New Standards in Mixture-of-Experts Architecture and Large-Scale Language Processing
MarkTechPost@AI
2024-09-15T10:05:35.000000Z
Mixture-of-Experts (MoE) Architectures: Transforming Artificial Intelligence AI with Open-Source Frameworks
MarkTechPost@AI
2024-09-07T12:20:16.000000Z
Wolf: A Mixture-of-Experts Video Captioning Framework that Outperforms GPT-4V and Gemini-Pro-1.5 in General Scenes, Autonomous Driving, and Robotics Videos
MarkTechPost@AI
2024-08-03T07:19:52.000000Z
算法、系统和应用,三个视角全面读懂混合专家(MoE)
机器之心
2024-07-27T04:08:49.000000Z
DeepSeek AI Researchers Propose Expert-Specialized Fine-Tuning, or ESFT to Reduce Memory by up to 90% and Time by up to 30%
MarkTechPost@AI
2024-07-06T18:46:40.000000Z
NVIDIA经济学:云服务商每花1美元买我的GPU 就能赚7美元!
快科技资讯
2024-07-01T07:05:27.000000Z
Two AI Releases SUTRA: A Multilingual AI Model Improving Language Processing in Over 30 Languages for South Asian Markets
MarkTechPost@AI
2024-06-30T04:31:48.000000Z
Training on a Dime: MEFT Achieves Performance Parity with Reduced Memory Footprint in LLM Fine-Tuning
MarkTechPost@AI
2024-06-12T09:01:28.000000Z