Nvidia Blog 02月16日
DeepSeek-R1 Now Live With NVIDIA NIM
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

DeepSeek-R1是一款具有卓越推理能力的开源模型,它采用思维链方法进行推理,以生成最佳答案。该模型是测试时扩展的完美示例,展示了加速计算对代理AI推理需求的重要性。R1在逻辑推理、数学、编码和语言理解等任务中表现出领先的准确性,同时提供高效的推理能力。NVIDIA NIM微服务现已提供拥有6710亿参数的DeepSeek-R1模型,开发者可以安全地进行实验并构建自己的专用代理。

💡DeepSeek-R1模型通过迭代“思考”问题,产生更多的输出tokens和更长的生成周期,从而不断提升模型质量。显著的测试时计算对于实现实时推理和来自推理模型更高质量的响应至关重要,这需要更大的推理部署。

🧮DeepSeek-R1是一个大型混合专家(MoE)模型,拥有惊人的6710亿个参数,比许多其他流行的开源LLM多10倍,并支持128,000个tokens的大输入上下文长度。R1的每一层都有256个专家,每个token并行路由到八个不同的专家进行评估。

🚀借助NVIDIA NIM微服务中的软件优化,单个服务器,通过NVLink和NVLink Switch连接的八个H200 GPU,可以每秒高达3,872个tokens的速度运行完整的6710亿参数DeepSeek-R1模型。这种吞吐量得益于NVIDIA Hopper架构的FP8 Transformer Engine以及用于MoE专家通信的900 GB/s NVLink带宽。

🛡️企业可以通过在首选的加速计算基础设施上运行NIM微服务来最大程度地提高安全性和数据隐私。使用带有NVIDIA NeMo软件的NVIDIA AI Foundry,企业还将能够为专门的AI代理创建定制的DeepSeek-R1 NIM微服务。

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, AI models like DeepSeek-R1 perform reasoning through the chain-of-thought method to generate the best answer.

Performing this sequence of inference passes — using reason to arrive at the best answer — is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively “think” through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments.

R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

To help developers securely experiment with these capabilities and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an NVIDIA NIM microservice preview on build.nvidia.com. The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.

Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform.

The DeepSeek-R1 NIM microservice simplifies deployments with support for industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure. Using NVIDIA AI Foundry with NVIDIA NeMo software, enterprises will also be able to create customized DeepSeek-R1 NIM microservices for specialized AI agents.

DeepSeek-R1 — a Perfect Example of Test-Time Scaling

DeepSeek-R1 is a large mixture-of-experts (MoE) model. It incorporates an impressive 671 billion parameters — 10x more than many other popular open-source LLMs — supporting a large input context length of 128,000 tokens. The model also uses an extreme number of experts per layer. Each layer of R1 has 256 experts, with each token routed to eight separate experts in parallel for evaluation.

Delivering real-time answers for R1 requires many GPUs with high compute performance, connected with high-bandwidth and low-latency communication to route prompt tokens to all the experts for inference. Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput is made possible by using the NVIDIA Hopper architecture’s FP8 Transformer Engine at every layer — and the 900 GB/s of NVLink bandwidth for MoE expert communication.

Getting every floating point operation per second (FLOPS) of performance out of a GPU is critical for real-time inference. The next-generation NVIDIA Blackwell architecture will give test-time scaling on reasoning models like DeepSeek-R1 a giant boost with fifth-generation Tensor Cores that can deliver up to 20 petaflops of peak FP4 compute performance and a 72-GPU NVLink domain specifically optimized for inference.

Get Started Now With the DeepSeek-R1 NIM Microservice

Developers can experience the DeepSeek-R1 NIM microservice, now available on build.nvidia.com. Watch how it works:

With NVIDIA NIM, enterprises can deploy DeepSeek-R1 with ease and ensure they get the high efficiency needed for agentic AI systems.

See notice regarding software product information.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSeek-R1 推理能力 开源模型 NVIDIA NIM 测试时扩展
相关文章