MarkTechPost@AI 02月09日
Sundial: A New Era for Time Series Foundation Models with Generative AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Sundial 是一种生成式时间序列基础模型,它通过结合连续 Tokenization、Transformer 模型和一种新颖的概率训练目标,克服了传统时间序列预测方法的局限性。与离散 Tokenization 相比,Sundial 使用带有本地补丁的连续 Tokenization,保持了时间序列的连续性,并实现了更具表现力的表征学习。该模型采用 TimeFlow Loss,无需预先进行概率假设即可学习预测分布。通过在包含来自真实世界和合成时间序列的万亿级时间点的大规模数据集 TimeBench 上进行训练,Sundial 在各种预测任务中展现出强大的泛化能力。它能够生成多个合理的未来轨迹,提高了不确定环境下的决策过程。

💡Sundial 采用原生补丁的连续 Tokenization 系统,将时间序列数据处理为连续片段,而非离散的类别 Token,从而保持了时间序列的连续性,增强了表征学习能力。

📈 TimeFlow Loss 是 Sundial 的一项创新,它是一种基于流匹配的生成式训练目标,使模型能够在没有预先概率假设的情况下学习预测分布,避免了模式崩溃,并能预测多个可能的未来轨迹。

🌐 Sundial 在 TimeBench 上进行训练,这是一个包含来自金融、天气、物联网、医疗保健等领域主题的万亿级数据集,确保了其在广泛领域的适用性和强度,实现了强大的零样本泛化能力。

🎯 Sundial 在各种零样本预测基准测试中取得了最先进的性能,在长期预测中始终优于以前的时间序列基础模型,在概率预测中也名列前茅,同时在推理速度方面具有显著优势。

Time series forecasting presents a fundamental challenge due to its intrinsic non-determinism, making it difficult to predict future values accurately. Traditional methods generally employ point forecasting, providing a single deterministic value that cannot describe the range of possible values. Although recent deep learning methods have improved forecasting precision, they require task-specific training and do not generalize across seen distributions. Most models place strict parametric assumptions or utilize discrete tokenization, which can give rise to out-of-vocabulary issues and quantization errors. Overcoming these constraints is key to creating scalable, transferable, and generalizable time series forecasting models that can function across domains without extensive re-training.

Current forecasting models can be roughly divided into two categories: statistical models and deep learning-based models. Statistical models, such as ARIMA and Exponential Smoothing, are interpretable but cannot capture the complex dependencies of large datasets. Transformer-based deep learning models display impressive predictive ability; however, they are not robust, require extensive in-distribution training, and are extremely dependent on discrete tokenization. This tokenization scheme, used in frameworks such as TimesFM, Timer, and Moirai, embeds time series data into categorical token sequences, discarding fine-grained information, rigid representation learning, and potential quantization inconsistencies. In addition, most forecasting models rely on prior probabilistic distributions, such as Gaussian priors, that limit their ability to capture the rich and highly variable nature of real-world data. These constraints limit the ability of existing methods to provide accurate and reliable probabilistic forecasts that adequately reflect uncertainty in decision-making applications.

To overcome these challenges, Sundial proposes a generative, scalable, and flexible time series foundation model that can learn complex patterns from raw data directly. In contrast to discrete tokenization, it uses continuous tokenization with native patching, which maintains time series continuity and enables more expressive representation learning. One of the innovations behind its forecasting power is TimeFlow Loss, a flow-matching-based generative training objective, which can enable the model to learn predictive distributions without probabilistic assumptions beforehand. This approach avoids mode collapse and enables multiple plausible future trajectories instead of a single deterministic prediction. In addition, the model is trained on TimeBench, a large-scale dataset of one trillion time points sampled from real-world and synthetic time series, which endows it with strong generalization capabilities on a wide range of forecasting tasks.

Sundial combines several innovations in tokenization, architecture, and training methods. Its native patching-based continuous tokenization system processes time series data as continuous segments rather than segmenting them into discrete categorical tokens. A re-normalization method enhances generalizability by managing variability in the dataset and distribution shifts. The basic architecture is a decoder-only Transformer that uses causal self-attention and rotary position embeddings, which improve its ability to manage temporal dependencies. Training stability and inference efficiency are enhanced through Pre-LN, FlashAttention, and KV Cache optimizations. The introduction of TimeFlow Loss enables probabilistic forecasting through flow-matching, allowing the model to learn non-parametric distributions without being constrained by fixed assumptions. Rather than producing a single-point estimate, the model produces multiple possible outcomes, thus improving decision-making processes in uncertain environments. Training is conducted on TimeBench, a trillion-scale dataset covering topics in finance, weather, IoT, healthcare, and more, thus ensuring wide applicability and strength across a broad range of domains.

Sundial achieves state-of-the-art performance on a variety of zero-shot forecasting benchmarks, reflecting superior accuracy, efficiency, and scalability. In the context of long-term forecasting, it outperforms previous time series foundation models consistently, reflecting substantial reductions in Mean Squared Error and Mean Absolute Error. In probabilistic forecasting, Sundial is one of the top-performing models, reflecting excellence in key metrics such as MASE and CRPS while having a substantial advantage in terms of inference speed. The scalability of the model is evident, with larger configurations leading to better accuracy, and TimeFlow Loss reflecting greater effectiveness compared to standard MSE- or diffusion-based objectives. Sundial also provides flexible inference capabilities, allowing users to trade off computational efficiency and forecasting accuracy, which makes it particularly useful for practical applications requiring reliable and adaptive time series forecasts.

Sundial is a significant breakthrough in time series forecasting with a generative modeling framework that combines continuous tokenization, Transformer models, and a novel probabilistic training objective. With TimeFlow Loss, it surpasses conventional parametric forecasting methods by learning highly flexible and unconstrained predictive distributions. When trained on the trillion-scale TimeBench dataset, it achieves state-of-the-art on a variety of forecasting tasks with strong zero-shot generalization. Its ability to generate multiple plausible future trajectories, combined with its efficiency, makes it a powerful decision-making tool in many industries, thereby reimagining the promise of time series foundation models.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

The post Sundial: A New Era for Time Series Foundation Models with Generative AI appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Sundial 时间序列预测 生成式AI TimeFlow Loss TimeBench
相关文章