Nvidia Developer 02月16日
Accelerating Time Series Forecasting with RAPIDS cuML
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何使用RAPIDS cuML加速skforecast的时间序列预测。随着数据量和预测步长的增加,直接多步预测的计算成本迅速增长。RAPIDS cuML是一个GPU加速的机器学习库,可以无缝集成到skforecast工作流中。通过将scikit-learn的回归器替换为cuML的GPU加速版本,可以在处理大型数据集时显著提高预测速度。实验表明,使用cuML可以将预测时间缩短25倍,从而更快地进行迭代和超参数优化。这使得企业能够更及时地做出决策,并应对不断变化的市场环境。

📈 时间序列预测在当今数据驱动的世界中至关重要,企业依靠它来做出明智的决策、优化流程和降低风险。准确的预测对于规划和战略至关重要,例如预测股市趋势、供需的突发变化或疾病的传播。

⏱️ 直接多步预测是一种流行的技术,它使用单独的模型来预测预测范围内的每个未来值。虽然在某些情况下可以产生更好的结果,但由于需要训练多个模型,因此计算成本也更高。

🚀 RAPIDS cuML可以插入到现有的skforecast工作流程中,加速直接多步预测。通过将scikit-learn回归器替换为cuML的GPU加速版本,可以在减少代码更改的情况下显著提高预测速度。例如,在拥有数十万条记录的大型数据集上,使用cuML可以将预测时间缩短25倍。

Time series forecasting is a powerful data science technique used to predict future values based on data points from the pastOpen source Python libraries like skforecast make it easy to run time series forecasts on your data. They allow you to “bring your own” regressor that is compatible with the scikit-learn API, giving you the flexibility to work seamlessly with the model of your choice. With growing datasets and techniques like direct multi-step forecasting that require you to run several models at once, forecasts can quickly become computationally expensive when running on CPU-based infrastructure. RAPIDS is a collection of open-source GPU-accelerated data science and AI libraries. cuML is a GPU-accelerated machine learning library for Python with a scikit-learn compatible API.In this blog post, we show how RAPIDS cuML can be used with skforecast to accelerate time series forecasting, allowing you to work with larger datasets and forecast windows.Why time series forecasting?In today’s data-driven world, enterprises rely on time series forecasting to make informed decisions, optimize processes, and mitigate risks. Whether it’s predicting stock market trends, sudden changes in supply or demand, or the spread of diseases, accurate forecasting is essential for planning and strategy. Historically, monthly or weekly forecasting may have been adequate to support decision making. But with the exponential growth of data and rise in global uncertainty, organizations now need to be able to run forecasts in near real-time to make proactive decisions about their business. Multistep forecasting One popular technique used in time series forecasting is recursive multi-step forecasting, in which you train a single model and apply it recursively to predict the next n values in the series. In contrast, direct multi-step forecasting uses a separate model to predict each future value in your forecast horizon. In other words, you are “directly” trying to forecast n steps ahead, rather than getting there via recursion. This can produce much better results in certain situations, but is also more computationally expensive since it requires training multiple models. Bringing accelerated computing to direct multistep forecasting RAPIDS cuML can be dropped into existing skforecast workflows. In the example below, we create a synthetic time series dataset with hourly seasonality and positive drift. We then use skforecast’s ForecasterDirect class for direct multi-step forecasting and substitute the scikit-learn regressor for cuML’s RandomForestRegressor:import numpy as npimport pandas as pdfrom skforecast.direct import ForecasterDirectfrom sklearn.ensemble import RandomForestRegressorimport cumlUSE_GPU = False# Parametersn_records = 100000drift_rate = 0.001seasonality_period = 24start_date = '2010-01-01'# Create synthetic dataset with positive driftdate_rng = pd.date_range(start=start_date, periods=n_records, freq='h')np.random.seed(42)noise = np.random.randn(n_records)drift = np.cumsum(np.ones(n_records) drift_rate)seasonality = np.sin(np.linspace(0, 2 np.pi, n_records) * (n_records / seasonality_period))data = noise + drift + seasonalitydf = pd.DataFrame(data, index=date_rng, columns=['y'])if USE_GPU: forecaster = ForecasterDirect( regressor=cuml.ensemble.RandomForestRegressor( n_estimators=200, max_depth=13, ), steps=100, lags=100, n_jobs=1, )else: forecaster = ForecasterDirect( regressor=RandomForestRegressor( n_estimators=200, max_depth=13, n_jobs=-1 # parallelize Random Forest to use all CPU cores ), steps=100, lags=100, n_jobs=1, )forecaster.fit(y=df['y'])predictions = forecaster.predict()With large datasets containing hundreds of thousands of records, CPU-based regressors can take a long time to churn through each forecast – recall that with direct multi-step forecasting we are training a separate model for every step in the forecast. Running this forecast on the CPU took over 43 minutes. Switching to cuML’s GPU-accelerated regressor allows the entire forecast to finish in just 103 seconds, a 25x speedup with minimal code changes.Figure 1. Time to fit the skforecast ForecasterDirect using Random Forest Regression from scikit-learn (CPU) vs. RAPIDS cuML (GPU) as the underlying regressor. Run on an NVIDIA H100 GPU vs. a dual socket Intel(R) Xeon(R) Platinum 8480CL CPU.Because the forecast runs faster, we can iterate much more quickly and perform hyperparameter optimization to find the best fit, or try out other regressors supported by cuML. ConclusionTime series forecasting has been around for decades but remains incredibly important today. Techniques like direct multi-step forecasting can be useful for optimizing forecasts, but are much more computationally expensive as the size of your data and forecast grows. Using accelerated computing libraries like RAPIDS cuML with skforecast makes it easy to accelerate your forecasting jobs with minimal code changes required.To learn more about accelerated machine learning, visit the cuML documentation, or take the Fundamentals of Accelerated Data Science course from NVIDIA Deep Learning Institute.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

时间序列预测 RAPIDS cuML skforecast GPU加速
相关文章