Nvidia Developer 02月16日
NVIDIA Hackathon Winners Share Strategies for RAPIDS-Accelerated ML Workflows
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ODSC West举办的NVIDIA Hackathon汇聚了约220支队伍,他们在24小时内利用机器学习模型进行竞赛。优胜队伍通过RAPIDS Python API优化数据处理,提升模型准确性和速度,赢得了NVIDIA RTX Ada Generation GPU和Google Colab credits等奖品。比赛中,选手们使用包含1200万个对象和超过100个匿名特征的合成表格数据,构建回归模型预测目标变量,并通过最小化均方根误差(RMSE)来优化模型。加速计算通过PyData库加速数据处理,无需更改代码,有效应对日益增长的数据量和复杂工作流程。

🏆NVIDIA Hackathon是一场为期24小时的机器学习竞赛,参赛者需利用提供的大约10GB合成表格数据,包含1200万个对象的信息,每个对象由超过100个匿名特征描述,构建回归模型来预测目标变量y,并最小化均方根误差(RMSE),以实现准确性和速度的平衡。

🚀优胜团队通过利用RAPIDS Python API,无需修改代码,即可通过PyData库实现GPU加速,从而更迅速地处理大量数据,展示了加速计算在应对日益增长的数据量方面的优势。

💡第一名Shyamal Shah通过cuDF pandas扩展加速pandas操作,并发现20个数值特征是重复的,选择具有最少空值的“magical”列作为代表,并对高基数分类变量采用目标均值编码,结合LightGBM框架,最终在1分47秒内完成训练和预测。

⚙️在数据处理方面,参赛者可以使用RAPIDS cuDF(通过pandas或Polars),以及RAPIDS cuML或XGBoost来优化数据处理和模型训练;同时,鼓励参赛者进行探索性数据分析(EDA)、特征工程以及集成多种机器学习算法。

Approximately 220 teams gathered at the Open Data Science Conference (ODSC) West this year to compete in the NVIDIA hackathon, a 24-hour machine learning (ML) competition. Data scientists and engineers designed models that were evaluated based on accuracy and processing speed. The top three teams walked away with prize packages that included NVIDIA RTX Ada Generation GPUs, Google Colab credits, and more. To earn these top spots, the winning teams leveraged RAPIDS Python APIs to produce the most accurate and performant solutions. During his talk at ODSC, Nick Becker, product lead for RAPIDS AI at NVIDIA, highlighted that the computational demands of AI, coupled with the ever-increasing volumes of generated data, are fueling data processing as the next phase of accelerated computing. Today, approximately 403 million terabytes of data are generated per day, putting immense pressure on data centers to process more data efficiently to achieve higher accuracy, privacy, and faster response times. As businesses operationalize and streamline AI systems end-to-end, they need to address related data processing bottlenecks. Accelerated computing enables more efficient processing for today’s increasingly complex workflows.The NVIDIA Hackathon competition demonstrated how data scientists could swiftly tackle the growing volumes of data and process it faster by leveraging GPU acceleration through PyData libraries, all while using the syntax they already know—with no code changes required. Participants were provided with approximately 10 GB of synthetic tabular data, containing information on 12 million subjects, each described by over 100 anonymous features, both categorical and numerical. Their task was to build a regression model to predict the target variable, y, and minimize root mean squared error (RMSE) to achieve both accuracy and speed. They had 24 hours to solve the problem and optimize their solutions.Participants leveraged RAPIDS cuDF through pandas or Polars, and some used RAPIDS cuML or XGBoost to optimize data processing and model training. Participants were encouraged to apply Exploratory Data Analysis (EDA) and feature engineering, and to ensemble multiple ML algorithms. This post features insights and strategies from the top three winners: Shyamal Shah, Feifan Liu with teammates Himalaya Dua and Sara Zare, and Lorenzo Mondragon. In their own words, they share how they approached the challenge and some tips and tricks for how they produced the fastest, most accurate solutions. Figure 1. More than 1,000 people participated in the NVIDIA hackathon at ODSC West 2024First place winner: Shyamal ShahThe NVIDIA hackathon challenged me to analyze an extensive tabular dataset using powerful NVIDIA GPUs through Google Colab. My approach prioritized both computational efficiency and predictive accuracy through several key optimizations. First, I leveraged the NVIDIA RAPIDS ecosystem by utilizing the cuDF pandas extension, which automatically accelerated pandas operations on the GPU. Through detailed feature analysis, I discovered that 20 numerical features were effectively duplicates, sharing identical statistical properties when normalized. This insight led me to select just one representative numerical feature, the “magical” column, which had the lowest number of null values.# Calculate statistics from training data base_median = train_df[base_feature].median() Q1 = train_df[base_feature].quantile(0.25) Q3 = train_df[base_feature].quantile(0.75) IQR = Q3 - Q1 lower_bound = Q1 - 1.5 IQR upper_bound = Q3 + 1.5 IQR# Process base feature df_processed['magical'] = df['magical'].fillna(base_median).clip(lower_bound, upper_bound)For the high-cardinality categorical variables, I implemented target mean encoding with smoothing instead of traditional one-hot encoding, which would have significantly increased the feature dimensionality. By narrowing down the original 106 features to just three key predictors, I substantially reduced the computational overhead while maintaining predictive power. # Calculate robust target encodings for high-cardinality categorical variables cat_encodings = {} global_mean = train_df['y'].mean() for col in ['trickortreat', 'kingofhalloween']: # Group by category and calculate stats cat_stats = (train_df.groupby(col)['y'] .agg(['mean', 'count']) .reset_index()) # Only keep categories that appear more than once frequent_cats = cat_stats[cat_stats['count'] > 1] # Strong smoothing factor due to high cardinality smoothing = 100 # Calculate smoothed means with stronger regularization frequent_cats['encoded'] = ( (frequent_cats['count'] frequent_cats['mean'] + smoothing global_mean) / (frequent_cats['count'] + smoothing) ) # Create dictionary only for frequent categories cat_encodings[col] = dict(zip(frequent_cats[col], frequent_cats['encoded']))# Process categorical features for col in ['trickortreat', 'kingofhalloween']: # Map categories to encodings, with special handling for rare/unseen categories df_processed[f'{col}_encoded'] = ( df[col].map(cat_encodings[col]) .fillna(global_mean) # Use global mean for rare/unseen categories )The implementation used Microsoft’s LightGBM framework, chosen specifically for its GPU optimization and top-level performance boosting capabilities on large datasets. Through careful parameter tuning and experimental iterations, I optimized the model’s hyperparameters to balance training speed and accuracy. The final solution completed the training and prediction cycle in just 1 minute and 47 seconds while achieving high accuracy. This experience demonstrated how combining GPU-accelerated computing with thoughtful feature engineering and algorithm selection can lead to both efficient and accurate solutions when working with large-scale datasets.Figure 2. NVIDIA headquarters tour with hackathon winners (left to right) Himalaya Dua, Sara Zare, and Shyamal ShahFigure 3. NVIDIA headquarters tour with hackathon winner Feifan LiuSecond place winner: Feifan Liu, PhD, and teammates Himalaya Dua and Sara ZareFrom my perspective, I think cuDF pandas is really efficient and easy to use. There is no need to learn new APIs for people who are already familiar with the original pandas. It makes loading and manipulating large volumes of data possible. One tip is to avoid complex preprocessing, for example, imputation. Directly assigning missing values as -1 (that is, create additional dimension in feature space) is effective for both performance and efficiency.train_df = df.copy()# train_df = sample_20_df.copy()categorical_cols = train_df.select_dtypes(include=['object', 'category']).columns.tolist()numerical_cols = train_df.select_dtypes(include=['number']).columns.tolist()num_col_only_minus_one = [col for col in numerical_cols if (train_df[col] < 0).sum() > 0 and (train_df[col] < 0).sum() == (train_df[col] == -1).sum()]train_df[categorical_cols] = train_df[categorical_cols].astype('category')train_df[num_col_only_minus_one]=train_df[num_col_only_minus_one].replace(-1, np.nan)test_df[categorical_cols] = test_df[categorical_cols].astype('category')test_df[num_col_only_minus_one]=test_df[num_col_only_minus_one].replace(-1, np.nan)Another tip is to leverage the CUDA support inside XGBoost for accelerated training.#baseline parametersxgb_regressor = xgb.XGBRegressor(objective='reg:squarederror', eval_metric = 'rmse', max_depth= 5, n_estimators=500, random_state=42, device='cuda', enable_categorical=True)Third place winner: Lorenzo MondragonTo tackle the challenge, I leveraged RAPIDS to integrate GPU acceleration into both Polars and pandas DataFrames. This enabled efficient preprocessing of the 12 million rows of the tabular data, including handling missing values, encoding categorical features, and sampling data to optimize for model training.For the regression task, I utilized XGBoost with GPU support (gpu_hist tree method) to train a model with hyperparameters fine-tuned for both accuracy and performance. I focused on:Filling numeric features with column means and categorical features with "Unknown".Encoding categorical data into compact UInt32 formats to improve memory efficiency.Experimenting with lazy loading and sampling through Polars for faster data ingestion and manipulation.# 1. Handle missing valuesnumeric_cols = train_data.select(cs.numeric()).columnscategorical_cols = [ col for col in train_data.columns if col not in numeric_cols and col not in ['id', 'y']]# Fill missing valuesdf = train_data.with_columns([ # Fill numeric columns with mean [ pl.col(col).fill_null(pl.col(col).mean()).alias(col) for col in numeric_cols ], # Fill categorical columns with 'Unknown' [ pl.col(col).fill_null("Unknown").alias(col) for col in categorical_cols ]])In the evaluation phase, the combination of Polars for preprocessing and GPU-accelerated XGBoost allowed me to strike a balance between model accuracy and inference speed. While my model ranked ninth in terms of accuracy, the efficiency gains from RAPIDS boosted my solution to third place overall when incorporating performance metrics.GPU acceleration is a game-changer: Using RAPIDS significantly reduced data preprocessing and model training times, making it feasible to process massive datasets within tight time constraints.Seamlessly integrate with familiar tools: Adopting RAPIDS required minimal changes to existing pandas and Polars workflows, highlighting the accessibility of GPU-accelerated libraries for data science practitioners.Optimization requires balance: While accuracy is crucial, optimizing for speed can be equally impactful in real-world scenarios where latency and resource efficiency are critical.Community and support matter: The resources and expert advice available during the hackathon were invaluable, especially when navigating cutting-edge tools like the Polars GPU engine and RAPIDS.Learn moreIf you’re new to RAPIDS, check out these resources to get started, and test drive these tutorials for cuDF pandas and Polars. You can watch the webinar, ​​Unlock Hackathon Success with NVIDIA: Tools and Q&A with NVIDIA Kaggle Grandmaster Jiwei Liu on how to leverage GPU-acceleration using cuDF pandas or Polars, explore feature engineering techniques, and gain insights from this notebook. Additionally, you can have a look at the sample notebooks created for the hackathon—one for cuDF pandas and one for  Polars GPU Engine, also created by NVIDIA Kaggle Grandmasters.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA Hackathon RAPIDS GPU加速 机器学习 数据处理
相关文章