Eric Sloof - NTPRO.NL 06月11日 22:50
InfiniBand on VMware vSphere 8: Updated Setup and Performance Insights
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文详细介绍了在 VMware vSphere 8 环境中配置 InfiniBand 的过程,重点关注性能优化、SR-IOV 设置以及故障排除。文章强调了 vSphere 8 在简化流程方面的改进,例如通过 vSphere Lifecycle Manager 实现自动化。内容涵盖了从驱动程序确认、MFT 和 NMST 的安装,到 SR-IOV 的启用和交换机配置等多个方面。此外,文章还提供了性能测试结果,验证了 InfiniBand 在处理 HPC 和 AI 工作负载时的出色表现,并总结了使用过程中的最佳实践和故障排除方法。

✅ 简化安装流程:vSphere 8 通过 vSphere Lifecycle Manager 简化了 InfiniBand 的安装过程,管理员可以更高效地导入和修复集群,减少了手动部署和停机时间。

💡 驱动程序和工具:vSphere 8 预装了 Mellanox 驱动程序,可以通过简单的 esxcli 命令进行验证。Mellanox Firmware Tools (MFT) 和 NMST 的安装也得到了简化,可以通过 vSphere Lifecycle Manager 轻松管理。

⚙️ SR-IOV 配置:对于需要多个 IB 网卡的负载,如大型语言模型训练,启用 SR-IOV 至关重要。文章提供了使用 mlxconfig 脚本启用高级 PCI 设置和配置虚拟功能的步骤。需要注意的是,在 vSphere 8 中,InfiniBand VFs 可能在 UI 中显示为“Down”,但这属于预期行为,实际链路状态应在虚拟机级别进行验证。

🚦 交换机配置与性能:管理员应确保 IB 交换机运行最新的 MLNX-OS,并启用 Open-SM 虚拟化支持,以支持 SR-IOV 功能。性能测试显示,单向带宽达到 396.5 Gbps,双向带宽达到 790 Gbps,证明了 InfiniBand 在处理 HPC 和 AI 工作负载时的强大性能。

With the increasing demand for high-performance networking in virtualized environments, configuring InfiniBand on VMware vSphere 8 has become a critical task for many IT teams. This blog explores the updated process and considerations for setting up InfiniBand using the latest tools and practices, providing insights into performance, SR-IOV configuration, and common troubleshooting scenarios.

The integration of InfiniBand on vSphere 8 has been streamlined with enhancements to the vSphere Lifecycle Manager and improved native driver support. For those familiar with vSphere 7, many procedures remain consistent, but vSphere 8 introduces some important changes worth noting.

The first step is to confirm that the native Mellanox driver is already present on the ESXi host. With vSphere 8, the driver comes pre-installed, and verification can be done using a simple esxcli command.

Installing Mellanox Firmware Tools, specifically MFT and NMST, is now much easier thanks to vSphere Lifecycle Manager. Instead of deploying packages manually across hosts, admins can use Lifecycle Manager to import and remediate clusters efficiently. These packages can be downloaded from NVIDIA’s website, and after uploading them to the vSphere Client, the entire cluster can be updated with minimal downtime.

In some cases, InfiniBand cards may not be visible via the mst status command after installation. This can typically be resolved by putting the native Mellanox driver into recovery mode using specific esxcli module parameters, followed by a couple of host reboots.

Enabling SR-IOV is particularly relevant for workloads like large language model training that require multiple IB cards. A script using mlxconfig can be used to enable advanced PCI settings and configure virtual functions on each device. It's important to remember that in vSphere 8, InfiniBand VFs may appear as 'Down' in the UI, which is expected behavior. The actual link state should be verified at the VM level.

For environments that continue to use PCI passthrough rather than SR-IOV, disabling SR-IOV can be done with a similar script that reverts the card settings and resets the ESXi module parameters.

On the switching side, administrators should ensure their IB switches are running an up-to-date MLNX-OS. Compatibility between switch firmware and adapter firmware is key to avoiding communication issues. Enabling Open-SM virtualization support on the switches is also critical to support SR-IOV functionality.

The paper outlines several behavioral nuances when using passthrough versus SR-IOV. For instance, certain Open-SM utilities may not function when a virtual function is passed to a VM, which is normal. Likewise, mst status behavior may differ depending on whether a physical or virtual function is used.

Troubleshooting steps are also provided for cases where MLX cards fail to appear in the mst status output. These include reloading the appropriate kernel modules and signaling system processes, after which recovery mode is temporarily enabled until the next host reboot.

Performance tests revealed that unidirectional bandwidth reached 396.5 Gbps using four queue pairs, nearly saturating the theoretical line rate of the InfiniBand cards. Bidirectional bandwidth tests showed performance scaling up to 790 Gbps with two cards, confirming the setup’s ability to handle demanding HPC and AI workloads.

In conclusion, VMware vSphere 8 enhances the experience of deploying InfiniBand by introducing automation through Lifecycle Manager and retaining robust performance tuning capabilities. With updated best practices, simplified installation, and clear guidance on SR-IOV and troubleshooting, IT teams can now fully leverage InfiniBand’s potential in virtualized environments, including VMware Cloud Foundation.

This blog captures the essence of the technical paper authored by Yuankun Fu, who has a strong background in HPC and AI performance optimization within VMware. His guidance in this paper provides both practical instructions and valuable performance data for teams looking to adopt or enhance InfiniBand in their vSphere environments.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

vSphere 8 InfiniBand SR-IOV 性能优化 VMware
相关文章