Nvidia Developer 02月16日
Powering the Next Wave of DPU-Accelerated Cloud Infrastructures with NVIDIA DOCA Platform Framework
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA推出了DOCA平台框架(DPF),旨在简化DPU的配置、生命周期管理和服务编排,使BlueField DPU在Kubernetes环境中更广泛地应用,加速AI和其他现代工作负载。DPF通过简化DPU的管理,降低了数据中心扩展的复杂性,并为开发者提供了一致的工具包,以便轻松管理BlueField DPU集群上的软件,加速云原生软件平台的构建。同时,DPF还通过提供标准化的API和工具,确保这些应用程序在BlueField加速基础设施上无缝运行,从而使服务提供商和企业能够利用强大的加速服务组合来构建高性能、安全和高效的云平台。

🚀 **简化DPU管理**:DPF通过简化DPU的配置、生命周期管理和服务编排,使BlueField DPU在Kubernetes环境中更易于访问,从而加速AI和其他现代工作负载。

⚙️ **扩展Kubernetes控制平面功能**:DPF扩展了Kubernetes控制平面功能到DPU,使管理员能够直接在BlueField DPU上部署和编排NVIDIA DOCA服务以及第三方DOCA服务。

🛡️ **提供标准化的API和工具**:DPF为ISV提供标准化的API和工具,确保应用程序在BlueField加速基础设施上无缝运行,从而使服务提供商和企业能够利用强大的加速服务组合来构建高性能、安全和高效的云平台。

🔄 **高效的DPU生命周期管理**:DPF提供对BlueField DPU配置和生命周期管理的端到端支持,自动执行固件更新、刷新和配置等流程,从而简化设置并减少停机时间。通过公开的API和自定义资源定义(CRD),DPF自动执行BlueField DPU生命周期,使云运营商能够从其标准K8s控制平面管理BlueField绑定服务,从而提供对K8s工作节点和DPU的统一“单窗格”视图和控制。

Organizations are increasingly turning to accelerated computing to meet the demands of generative AI, 5G telecommunications, and sovereign clouds. NVIDIA has unveiled the DOCA Platform Framework (DPF), providing foundational building blocks to unlock the power of NVIDIA BlueField DPUs and optimize GPU-accelerated computing platforms. Serving as both an orchestration framework and an implementation blueprint, DPF enables developers, service providers, and enterprises to seamlessly create BlueField-accelerated, cloud-native software platforms.By simplifying DPU provisioning, lifecycle management, and service orchestration, DPF makes BlueField DPUs broadly accessible in Kubernetes environments for accelerating AI and other modern workloads. Additionally, DPF fortifies the vibrant ecosystem of BlueField-accelerated applications and services, fueling advancements in scalable cloud platforms.Addressing a key gap in cloud infrastructureNVIDIA’s commitment to the CPU-GPU-DPU trifecta is well known, and with the introduction of DPF, NVIDIA is taking a bold leap forward in the DPU aspect of this architecture. DPF marks an important step toward a more modern cloud infrastructure, helping to redefine how BlueField DPUs are integrated into data centers to address key challenges in performance, efficiency, and security.NVIDIA BlueField DPUs already offer a high-performance, scalable alternative to traditional, CPU-centric infrastructure, offloading critical networking, storage, and security functions from host CPUs to accelerate data center operations. However, until now, managing DPU-driven services at data-center scale has been a fragmented and cumbersome process.This is where DPF comes in: a dedicated framework that simplifies the deployment, orchestration, and scaling of BlueField-accelerated cloud infrastructure. DPF extends Kubernetes control plane functionality to DPUs, enabling admins to deploy and orchestrate both NVIDIA DOCA services and third-party, DOCA-based services directly on BlueField DPUs. Equipped with a purpose-built SDK for seamless integration, DPF offers developers a consistent, modular toolkit to easily manage software across BlueField DPU fleets. This reduces time and complexity, enabling developers to focus on building robust software platforms and high-impact applications rather than managing DPU software orchestration. Additionally, DPF plays a crucial role in the ecosystem by enabling infrastructure independent software vendors (ISVs) to build and integrate BlueField applications with confidence. By providing standardized APIs and tools, DPF ensures that these applications operate seamlessly on BlueField-accelerated infrastructure. This, in turn, also benefits service providers and enterprises, enabling them to leverage a robust portfolio of accelerated services to build high-performance, secure, and efficient cloud platforms.To simplify and streamline DPU management for cloud-native environments, DPF addresses two primary workflows:DPU provisioning and lifecycle management: Covers the initial steps to deploy BlueField DPUs, including firmware and software installation and configuration, and ongoing maintenance tasks.DPU service management and orchestration: Involves deploying and managing infrastructure services such as SDN controller software, storage target software, firewall, load-balancers, and more, including service function chaining.Efficient DPU provisioning and lifecycle managementDPF provides end-to-end support for BlueField DPU provisioning and lifecycle management, automating processes like firmware updates, flashing, and configuration to streamline setup and reduce downtime. Key tasks such as provisioning, configuration, monitoring, and troubleshooting are simplified, making it easier to integrate and operate BlueField DPUs at scale.DPF maintains an updated state for each BlueField across the data center, enabling dynamic responsiveness to DPU health. When a DPU requires maintenance, DPF can proactively drain the node in a controlled manner, minimizing or eliminating impact to active production workloads. Through rolling update capabilities, admins can control batch updates by specifying a percentage of BlueField DPUs to update at a time, avoiding mass updates that could impact system stability. Real-time health monitoring and alerting equip admins to rapidly identify and address issues, essential for high-reliability environments like telecom and AI-powered data centers.Through exposed APIs and Custom Resource Definitions (CRDs), DPF automates the BlueField DPU lifecycle, enabling cloud operators to manage BlueField-bound services from their standard K8s control plane, providing a unified “single pane of glass” view and control over both K8s worker nodes and DPUs. The DPF implementation blueprint is based on upstream Kubernetes, allowing technology partners to adapt and scale the framework for diverse infrastructure requirements and enterprise products.Comprehensive DPU service management and orchestrationDPF brings a new level of sophistication to cloud-native environments by enabling seamless integration of BlueField DPUs into Kubernetes-based workflows. By introducing a dedicated, secondary Kubernetes control plane, DPF empowers admins to efficiently manage NVIDIA DOCA services and third-party, DOCA-based applications deployed on BlueField DPUs. The DPF Operator manages this secondary DPU Kubernetes control plane autonomously, overseeing all aspects of service deployment, monitoring, and lifecycle management. DPF is designed to abstract the DPU management complexity from admins interacting only with the primary Kubernetes control plane using familiar Kubernetes constructs, eliminating any need to directly manage the DPU control layer. DPF also provides flexibility for ISVs, enabling them to implement their own Kubernetes control plane for customized BlueField service management and orchestration.By optimizing service orchestration across a fleet of BlueField DPUs, DPF simplifies the deployment and management of complex, distributed workloads. With robust lifecycle management capabilities, DPF supports seamless service updates, scaling, and rollbacks, ensuring that admins can efficiently manage changes without disrupting ongoing operations. Combined with DOCA service function chaining (SFC), DPF facilitates secure, efficient chaining of services—such as accelerated networking (CNIs), high-performance data services (CSIs), and firewall functions—to handle complex, multi-step tasks. To ensure smooth deployment, DPF provides predeployment verification, confirming the DPU can host required services and returning meaningful error messages when requirements aren’t met. Additionally, DPF offers monitoring and debuggability features to help admins manage and troubleshoot services in real-time, making it easier to achieve high reliability and transparency.Through DPF, admins gain intuitive, cloud-native tools for provisioning, managing, and orchestrating services on BlueField DPUs. This seamless integration with existing Kubernetes workflows accelerates time-to-deployment for advanced BlueField-accelerated applications across sectors such as telecommunications, cloud, and enterprise environments.Modular architecture fosters ease of integration DPF is designed with a modular architecture that simplifies integration and enables tailored functionality for BlueField-accelerated infrastructures. This flexible design is built on a collection of core components and tools, giving developers, service providers, and enterprises a streamlined approach to provisioning and managing BlueField DPUs within cloud-native environments.Figure 1 illustrates the DPF software stack, highlighting DPF functions operating on both the host and BlueField DPU. It also includes various infrastructure software services for networking, storage, and security, some of which expose accelerated IO interfaces to containerized workloads through Kubernetes plugins (CNI and CSI).Figure 1. NVIDIA DPF stackThese tools and services, provided through containers, Helm charts, and an implementation blueprint, equip developers with everything needed to integrate and build on DPF.DPF OperatorAt the heart of the DPF orchestration layer is the DPF Operator, which automates DPU provisioning, lifecycle management, and service orchestration. It provides Kubernetes users with a familiar cloud-native interface, simplifying complex configurations and enabling BlueField DPUs to be deployed and managed just like other cluster resources. With built-in support for automated updates and resource management, the DPF Operator makes it easy to deploy and maintain BlueField DPUs in production environments.DOCA for HostThe DOCA for Host software supplies a comprehensive set of provisioning tools that streamline the deployment and configuration of BlueField DPUs. DOCA for Host handles the firmware, BIOS, and system configurations needed to integrate the DPU with the host environment, ensuring a consistent and reliable setup across deployments.OVS-DOCAOVS-DOCA serves as the core networking stack within DPF, facilitating secure, high-performance network connectivity for BlueField-accelerated applications. It provides advanced networking functions and efficient traffic routing within Kubernetes environments, ensuring that BlueField resources can be fully utilized without compromising on performance or security. This foundation enables developers to build high-throughput, latency-sensitive applications with ease.DOCA ServicesA curated set of DOCA services hosted on NVIDIA NGC enhances the capabilities of the BlueField DPU, with DPF providing the tools to fetch and deploy these services directly on the BlueField as part of the Kubernetes cluster. These ready-to-use services—covering advanced monitoring, networking, storage, security, and more—expand BlueField functionality, enabling rapid deployment of critical services. Through NVIDIA NGC, users gain seamless access to an expanding repository of NVIDIA-certified services and applications that fully integrate with DPF. The initial DPF release includes HBN, OVN-Kubernetes, Telemetry, and BlueMan as the first set of DOCA services, with subsequent releases set to introduce support for additional services to further enhance functionality and expand integration capabilities.In addition to NVIDIA services, DPF orchestrates third-party DOCA services that bring specialized functionalities to the BlueField environment. From network security solutions to load balancing and firewall applications, third-party services enable users to create a robust ecosystem tailored to their specific needs. By embracing an open, modular architecture, DPF fosters collaboration with service vendors, providing users with a wider range of functionality, and flexibility.DPF empowers developers with the tools and services they need—packaged in containers, Helm charts, and an implementation blueprint—to easily integrate with DPF and build, customize, and deploy advanced BlueField-accelerated software platforms.Lead the future of DPU-accelerated cloud computing with DPFThe NVIDIA DOCA Platform Framework (DPF) redefines cloud infrastructure for BlueField-accelerated environments, transforming how cloud services are provisioned and managed. In addition, the NVIDIA DPF roadmap signals exciting capabilities on the horizon. Upcoming features will bring zero-trust capabilities to bare-metal, BlueField-accelerated infrastructures, securing environments from the hardware layer up.Developers, telcos, and enterprises are encouraged to explore the capabilities of DPF, download the blueprint, and experiment with building applications optimized for high-performance and scalable infrastructures. Get started with DPF today and lead the future of BlueField-accelerated cloud infrastructure.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA DOCA BlueField DPU Kubernetes 加速计算
相关文章