Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models

Content feed of the TransferLab — appliedAI Institute 2024年11月27日

Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models

本文介绍了将得分扩散模型应用于基于模拟的推理（SBI）的方法。在SBI中，条件密度估计器用于近似后验分布。得分扩散模型在生成建模方面取得了显著成功，并且可以被视为条件密度估计器。本文研究表明，得分扩散模型在SBI基准任务上表现与现有方法相似或更好。许多基于神经网络的SBI方法使用归一化流作为条件密度估计器，但SBI的密度估计并不局限于此，可以使用任何灵活的密度估计器，例如最近引入的基于流匹配训练的连续归一化流。得分扩散模型通过逐渐向样本添加噪声，学习从目标分布中采样，直到样本收敛到易于采样的平稳分布，并通过反转该过程来实现从目标分布中采样。本文将得分扩散模型应用于SBI，并通过一系列基准任务验证了其有效性。

🤔 **基于模拟的推理（SBI）**：在SBI中，条件密度估计器用于近似基于模拟数据的后验分布，以解决难以直接计算后验分布的问题。

💡 **得分扩散模型**：得分扩散模型是一种生成模型，通过逐渐向数据添加噪声并学习反转该过程来生成新的数据样本。其核心思想是学习数据分布的得分函数（即对数概率密度函数的梯度）。

🔄 **得分扩散模型在SBI中的应用**：本文将得分扩散模型应用于SBI，使用得分匹配技术来近似后验分布的得分函数，从而实现从后验分布中采样。

📊 **实验结果**：在SBI基准任务上，得分扩散模型的表现与现有方法相似或更好，验证了其在SBI中的有效性。

🚀 **方法的优势**：得分扩散模型在SBI中提供了一种灵活且有效的密度估计方法，为解决复杂的推理问题提供了新的思路。

In Simulation-based inference (SBI) conditional density estimators are used to approximate the posterior distribution from simulated data. Score-based diffusion models have shown remarkable success in generative modeling, and they can also be framed as conditional density estimators. This paper introduces score-based diffusion models for SBI. On a set of SBI benchmarking tasks, they perform similarly or better than existing methods.Many recent neural network-based methods in simulation-based inference (SBI) use normalizing flows as conditional densityestimators. However, density estimation for SBI is not limited to normalizingflows—one could use any flexible density estimator. For example, continuousnormalizing flows trained with flow matching have been introduced for SBI recently[Wil23F]. Given the success of score-based diffusion models[Son19G], it seems promising to apply them to SBI as well,which is the primary focus of the paper presented here [Sha22S].Diffusion models for SBIThe idea of diffusion models is to learn sampling from a target distribution,here $p(\cdot \mid x)$, by gradually adding noise to samples $\theta_0 \simp(\cdot \mid x)$ until they converge to a stationary distribution $\pi$ that iseasy to sample, e.g., a standard Gaussian. On the way, one learns tosystematically reverse this process of diffusing the data. Subsequently, itbecomes possible to sample from the simple noise distribution and to graduallytransform the noise sample back into a sample from the target distribution.More formally, the forward noising process $(\thetat){t \in [0, T]}$ can bedefined as a stochastic differential equation (SDE)$$d\theta_t = f_t(\theta_t)dt + g_t(\theta_t)dw_t,$$where $f_t: \mathbb{R}^d \to \mathbb{R}^d$ is the drift coefficient, $g_t:\mathbb{R}^d \to \mathbb{R} \times \mathbb{R}^d$ is the diffusion coefficient,and $w_t$ is a standard $\mathbb{R}^d$-valued Brownian motion.Under mild conditions, the time-reversed process $(\bar{\theta}t) :=(\theta{T-t})_{t \in [0, T]}$ is also a diffusion process [And82R], evolving according to$$d\bar{\theta}t = [-f{T-t}(\bar{\theta}t) +g^2{T-t}(\bar{\theta}t)\nabla{\theta}\log p_{T-t}(\bar{\theta}t \midx)]dt+g{T-t}(\bar{\theta}_t)dw_t.$$Given these two processes, one can diffuse a data point $\theta_0 \sim p(\cdot\mid x)$ into noise $\theta_T \sim \pi$ by running the forward noising processand reconstruct it as $\bar{\theta}_T \sim pT(\cdot \mid x) = p(\cdot \mid x)$using the time-reversed denoising process.In the SBI setting, we do not have access to the scores of the trueposterior, $\nabla{\theta}\log p_t(\theta_t \mid x)$. However, we canapproximate the scores using score-matching [Son21S].Score-matching for SBIOne way to perform score-matching is training a time-varying score network$s_{\psi}(\thetat, x, t) \approx \nabla\theta \log p_t(\thetat \mid x)$ toapproximate the score of the perturbed posterior. This network can be optimizedto match the unknown posterior by minimizing the conditional denoising posteriorscore matching objective given by$$ \mathcal{J}^{DSM}{post}(\psi) = \frac{1}{2}\int_0^T \lambdat\mathbb{E}[||s{\psi}(\thetat, x, t) - \nabla\theta \log p_{t|0}(\theta_t\mid \theta0)||^2]dt. $$Note that this objective does not require access to the actual score function ofthe posterior $\nabla\theta \log p_t(\thetat \mid x)$, but only to that of thetransition density $p{t|0}(\theta_t \mid \theta0)$, which is defined by theforward noising process (see paper for details). The expectation is takenover $p{t|0}(\theta_t \mid \theta_0) p(x \mid \theta_0) p(\theta_0)$, i.e.,over samples from the forward noise process, samples from the likelihood (thesimulator) and samples from the prior. Thus, the training routine for performingscore-matching in the SBI setting amounts toDraw samples $\theta_0 \sim p(\theta)$ from the prior, simulate $x \sim p(x\theta_0)$ from the likelihood, and obtain $\thetat \sim p{t|0}(\theta_t\mid \theta_0)$ using the forward noising process.Use these samples to train the time-varying score network, minimizing a MonteCarlo estimate of the denoising score matching objective.Generate samples from the approximate score-matching posterior$\bar{\theta}_T \sim p(\theta \mid x_o)$ by sampling $\bar{\theta}0 \sim\pi$ from the noise distribution and plugging $\nabla\theta \logp_t(\theta_t \mid xo) \approx s{\psi}(\theta_t, x_o, t)$ into thereverse-time process to obtain $\bar{\theta}_T$.The authors call their approach neural posterior score estimation (NPSE).In a similar vein, score-matching can be used to approximate the likelihood $p(x\mid \theta)$, resulting in neural likelihood score estimation (NLSE)(requiring additional sampling via MCMC or VI).Sequential neural score estimationNeural posterior estimation enables amortized inference: once trained, theconditional density estimator can be applied to various $x_o$ to obtaincorresponding posterior approximations with a single forward pass through thenetwork. In some scenarios, amortization is an excellent property. However, ifsimulators are computationally expensive and one is interested in only aparticular observation $x_o$, sequential SBI methods can help to explore theparameter space more efficiently, obtaining a better posterior approximationwith fewer simulations.The idea of sequential SBI methods is to extend the inference over multiplerounds: In the first round, training data comes from the prior. In thesubsequent rounds, a proposal distribution tailored to be informative about$x_o$ is used instead, e.g., the current posterior estimate. Because samples inthose rounds do not come from the prior, the resulting posterior will not be thedesired posterior but the proposal posterior. Several variants of sequentialneural posterior estimation have been proposed, each with its own strategy forcorrecting this mismatch to recover the actual posterior (see [Lue21B] for an overview).[Sha22S] present score-matching variants forboth sequential NPE (similar to the one proposed in [Gre19A]) and for sequential NLE.Empirical resultsFigure 1. [Sha22S], Figure 2. Posterior accuracy of various SBI methods onfour benchmarking tasks. Measured in two-sample classification test accuracy (C2ST, 0.5is best).The authors evaluate their approach on a set of four SBI benchmarking tasks [Lue21B]. They find that score-based methods for SBI perform onpar with and, in some cases, better than existing flow-based SBI methods (Figure1).LimitationsWith score-based diffusion models, this paper presented a potent conditional densityestimator for SBI. It demonstrated similar performance to existing SBI methods on asubset of benchmarking tasks, particularly when simulation budgets were low, such as inthe two moons task. However, the authors did not extend their evaluation to real-worldSBI problems, which are typically more high-dimensional and complex than thebenchmarking tasks.It is important to note that diffusion models can be more computationally intensiveduring inference time than existing methods. For instance, while normalizing flows canbe sampled and evaluated with a single forward pass through the neural network,diffusion models necessitate solving an SDE to obtain samples or log probabilities fromthe posterior. Therefore, akin to flow-matching methods for SBI, score-matching methodsrepresent promising new tools for SBI, but they imply a trade-off at inferencetime that will depend on the specific problem.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

模拟推理得分扩散模型条件密度估计后验分布神经网络

相关文章

What is a long context window?

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Are Large Language Models a Path to AGI? with Ben Goertzel - #625

The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

Learning to Ponder: Memory in Deep Neural Networks with Andrea Banino - #528

Deep Learning is Eating 5G. Here’s How, w/ Joseph Soriaga - #525

Vector Quantization for NN Compression with Julieta Martinez - #498

Skip-Convolutions for Efficient Video Processing with Amir Habibian - #496

Natural Graph Networks with Taco Cohen - #440

Neural Ordinary Differential Equations with David Duvenaud - #364