MarkTechPost@AI 02月25日
DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeek AI发布了DeepEP,这是一个专为MoE模型和专家并行(EP)设计的通信库,旨在解决GPU之间通信的效率问题。DeepEP提供高吞吐量、低延迟的GPU内核,优化了训练和推理过程中的数据交换。该库支持低精度操作(包括FP8),适用于内网和跨网环境,通过普通内核和低延迟内核两种类型,满足不同场景的需求,显著提升了大规模语言模型部署的效率和性能。

🚀 DeepEP是DeepSeek AI专门为MoE模型和专家并行设计的通信库,旨在优化GPU之间的数据交换,提高模型训练和推理的效率。

🚄 DeepEP提供两种主要类型的内核:普通内核针对高吞吐量场景,如推理预填充阶段或训练,利用NVLink和RDMA技术实现高达153 GB/s的内网通信和43-47 GB/s的跨网通信;低延迟内核则专注于实时应用,通过RDMA实现低至163微秒的调度延迟。

⚙️ DeepEP具备灵活的自适应配置,允许用户调整SM数量或设置环境变量来管理流量隔离,并支持低精度操作(FP8),降低内存占用,加速数据传输,从而在资源受限的环境中更高效地部署模型。

Large language models that use the Mixture-of-Experts (MoE) architecture have enabled significant increases in model capacity without a corresponding rise in computation. However, this approach also introduces challenges—especially when it comes to communication between GPUs. In MoE models, only a subset of experts is active for any given token, so efficiently exchanging data among devices is critical. Traditional methods for all-to-all communication can create bottlenecks that increase latency and underutilize GPU resources. In latency-sensitive settings, such as real-time inference, even small delays can affect overall performance. Moreover, while low-precision operations (such as FP8) help reduce memory usage, they require careful optimization to maintain model quality. These issues underscore the need for a communication library tailored to the specific demands of expert parallelism.

DeepSeek AI has recently introduced DeepEP, a communication library specifically designed for MoE models and expert parallelism (EP). DeepEP addresses the inefficiencies inherent in how tokens are dispatched and aggregated across GPUs. The library provides high-throughput, low-latency all-to-all GPU kernels—commonly referred to as MoE dispatch and combine kernels—that streamline data exchange during both training and inference. Notably, DeepEP supports low-precision operations (including FP8), aligning with techniques detailed in the DeepSeek-V3 paper. This release responds directly to the challenges of scaling MoE architectures in both intranode and internode environments.

Technical Overview and Benefits

DeepEP offers two primary types of kernels designed to meet different operational needs:

DeepEP further offers flexibility through adaptive configurations. Users can adjust parameters such as the number of SMs in use or set environment variables (for example, NVSHMEM_IB_SL) to manage traffic isolation. Adaptive routing, which is currently supported in the low-latency kernels, helps distribute network traffic evenly under heavy loads, thereby improving robustness.

Performance Insights and Practical Outcomes

The performance metrics for DeepEP are noteworthy. In typical tests using normal kernels, intranode communication can achieve throughput up to 153 GB/s, and internode setups maintain around 43–47 GB/s over RDMA. Low-latency kernels are particularly effective in production scenarios; for a batch of 128 tokens processed with eight experts, dispatch latency can be as low as 163 microseconds. Such improvements mean that the overall inference process becomes more efficient, allowing for larger batch sizes and smoother overlap between computation and communication.

In practical terms, these optimizations lead to faster response times in inference decoding and improved throughput in training scenarios. The inclusion of FP8 support not only lowers the memory footprint but also facilitates quicker data transfers, which is essential when deploying models in environments where resources are limited.

Conclusion

DeepEP is a thoughtful contribution to the field of large-scale language model deployment. By addressing key communication bottlenecks in MoE architectures, it enables more efficient training and inference. Its dual-kernel approach—with one set designed for high throughput and another for low latency—offers flexibility for a range of applications. Built with support for low-precision operations and equipped with mechanisms for adaptive configuration, DeepEP provides researchers and developers a practical tool to further optimize expert parallelism.

In summary, DeepSeek AI’s release of DeepEP represents a careful, well-engineered solution that balances performance with resource efficiency. Its design helps pave the way for more scalable and responsive AI models, supporting both academic research and real-world applications in a cost-effective manner.


Check out the GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Recommended Read- LG AI Research Releases NEXUS: An Advanced System Integrating Agent AI System and Data Compliance Standards to Address Legal Concerns in AI Datasets

The post DeepSeek AI Releases DeepEP: An Open-Source EP Communication Library for MoE Model Training and Inference appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek AI DeepEP MoE模型 专家并行 通信库
相关文章