MarkTechPost@AI 15小时前
NVIDIA XGBoost 3.0: Training Terabyte-Scale Datasets with Grace Hopper Superchip
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA发布了XGBoost 3.0,这是一项在可扩展机器学习领域的重大进展。新版本能够在一颗GH200 Grace Hopper超级芯片上训练从千兆字节到1太字节(TB)的梯度提升决策树(GBDT)模型。这一突破性进展得益于XGBoost 3.0中新引入的外部内存量化DMatrix,它允许数据直接从主机内存流式传输到GPU,克服了传统GPU训练的内存限制。皇家银行(RBC)等机构已报告了高达16倍的速度提升和94%的总拥有成本(TCO)降低。该技术简化了扩展机器学习流水线的过程,为金融和企业用户提供了更快捷、更经济、更易于管理的解决方案, democratizing access to massive machine learning。

🚀 XGBoost 3.0突破内存限制:通过引入外部内存量化DMatrix,XGBoost 3.0首次实现在单颗NVIDIA GH200 Grace Hopper超级芯片上处理高达1TB的数据集,解决了传统GPU训练受限于GPU显存的瓶颈,使得千兆字节级数据的机器学习训练成为可能。

💡 简化与加速:新版本利用Grace Hopper超级芯片的协同内存架构和超高带宽,实现数据从主机RAM到GPU的直接流式传输,无需复杂的跨节点框架。这不仅大幅简化了机器学习流水线的扩展过程,还带来了高达16倍的速度提升和94%的总拥有成本(TCO)降低,例如皇家银行(RBC)的实际应用案例。

⚙️ 技术实现与易用性:外部内存量化DMatrix通过预分桶、压缩数据并按需流式传输,在降低GPU内存占用的同时保持模型准确性。对于使用RAPIDS的数据科学团队,升级为新方法仅需少量代码修改,集成简单。

🌟 XGBoost 3.0新特性:除了核心的外部内存功能,新版本还包括对跨GPU集群分布式外部内存的实验性支持,以及在外部内存模式下对分类特征、分位数回归和SHAP可解释性的支持,进一步提升了模型训练的灵活性和能力。

NVIDIA has unveiled a major milestone in scalable machine learning: XGBoost 3.0, now able to train gradient-boosted decision tree (GBDT) models from gigabytes up to 1 terabyte (TB) on a single GH200 Grace Hopper Superchip. The breakthrough enables companies to process immense datasets for applications like fraud detection, credit risk modeling, and algorithmic trading, simplifying the once-complex process of scaling machine learning ML pipelines.

Breaking Terabyte Barriers

At the heart of this advancement is the new External-Memory Quantile DMatrix in XGBoost 3.0. Traditionally, GPU training was limited by the available GPU memory, capping achievable dataset size or forcing teams to adapt complex multi-node frameworks. The new release leverages the Grace Hopper Superchip’s coherent memory architecture and ultrafast 900GB/s NVLink-C2C bandwidth. This enables direct streaming of pre-binned, compressed data from host RAM into the GPU, overcoming bottlenecks and memory constraints that previously required RAM-monster servers or large GPU clusters.

Real-World Gains: Speed, Simplicity, and Cost Savings

Institutions like the Royal Bank of Canada (RBC) have reported up to 16x speed boosts and a 94% reduction in total cost of ownership (TCO) for model training by moving their predictive analytics pipelines to GPU-powered XGBoost. This leap in efficiency is crucial for workflows with constant model tuning and rapidly changing data volumes, allowing banks and enterprises to optimize features faster and scale as data grows.

How It Works: External Memory Meets XGBoost

The new external-memory approach introduces several innovations:

Technical Best Practices

Upgrades

Other highlights in XGBoost 3.0 include:

Industry Impact

By bringing terabyte-scale GBDT training to a single chip, NVIDIA democratizes access to massive machine learning for both financial and enterprise users, paving the way for faster iteration, lower cost, and lower IT complexity.

XGBoost 3.0 and the Grace Hopper Superchip together mark a major leap forward in scalable, accelerated machine learning.


Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post NVIDIA XGBoost 3.0: Training Terabyte-Scale Datasets with Grace Hopper Superchip appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

XGBoost 3.0 NVIDIA Grace Hopper 机器学习 大数据训练 梯度提升
相关文章