In Simulation-based inference (SBI) conditional density estimators are used to approximate the posterior distribution from simulated data. Score-based diffusion models have shown remarkable success in generative modeling, and they can also be framed as conditional density estimators. This paper introduces score-based diffusion models for SBI. On a set of SBI benchmarking tasks, they perform similarly or better than existing methods.Many recent neural network-based methods in simulation-based inference (SBI) use normalizing flows as conditional densityestimators. However, density estimation for SBI is not limited to normalizingflows—one could use any flexible density estimator. For example, continuousnormalizing flows trained with flow matching have been introduced for SBI recently[Wil23F]. Given the success of score-based diffusion models[Son19G], it seems promising to apply them to SBI as well,which is the primary focus of the paper presented here [Sha22S].Diffusion models for SBIThe idea of diffusion models is to learn sampling from a target distribution,here $p(\cdot \mid x)$, by gradually adding noise to samples $\theta_0 \simp(\cdot \mid x)$ until they converge to a stationary distribution $\pi$ that iseasy to sample, e.g., a standard Gaussian. On the way, one learns tosystematically reverse this process of diffusing the data. Subsequently, itbecomes possible to sample from the simple noise distribution and to graduallytransform the noise sample back into a sample from the target distribution.More formally, the forward noising process $(\thetat){t \in [0, T]}$ can bedefined as a stochastic differential equation (SDE)$$d\theta_t = f_t(\theta_t)dt + g_t(\theta_t)dw_t,$$where $f_t: \mathbb{R}^d \to \mathbb{R}^d$ is the drift coefficient, $g_t:\mathbb{R}^d \to \mathbb{R} \times \mathbb{R}^d$ is the diffusion coefficient,and $w_t$ is a standard $\mathbb{R}^d$-valued Brownian motion.Under mild conditions, the time-reversed process $(\bar{\theta}t) :=(\theta{T-t})_{t \in [0, T]}$ is also a diffusion process [And82R], evolving according to$$d\bar{\theta}t = [-f{T-t}(\bar{\theta}t) +g^2{T-t}(\bar{\theta}t)\nabla{\theta}\log p_{T-t}(\bar{\theta}t \midx)]dt+g{T-t}(\bar{\theta}_t)dw_t.$$Given these two processes, one can diffuse a data point $\theta_0 \sim p(\cdot\mid x)$ into noise $\theta_T \sim \pi$ by running the forward noising processand reconstruct it as $\bar{\theta}_T \sim pT(\cdot \mid x) = p(\cdot \mid x)$using the time-reversed denoising process.In the SBI setting, we do not have access to the scores of the trueposterior, $\nabla{\theta}\log p_t(\theta_t \mid x)$. However, we canapproximate the scores using score-matching [Son21S].Score-matching for SBIOne way to perform score-matching is training a time-varying score network$s_{\psi}(\thetat, x, t) \approx \nabla\theta \log p_t(\thetat \mid x)$ toapproximate the score of the perturbed posterior. This network can be optimizedto match the unknown posterior by minimizing the conditional denoising posteriorscore matching objective given by$$ \mathcal{J}^{DSM}{post}(\psi) = \frac{1}{2}\int_0^T \lambdat\mathbb{E}[||s{\psi}(\thetat, x, t) - \nabla\theta \log p_{t|0}(\theta_t\mid \theta0)||^2]dt. $$Note that this objective does not require access to the actual score function ofthe posterior $\nabla\theta \log p_t(\thetat \mid x)$, but only to that of thetransition density $p{t|0}(\theta_t \mid \theta0)$, which is defined by theforward noising process (see paper for details). The expectation is takenover $p{t|0}(\theta_t \mid \theta_0) p(x \mid \theta_0) p(\theta_0)$, i.e.,over samples from the forward noise process, samples from the likelihood (thesimulator) and samples from the prior. Thus, the training routine for performingscore-matching in the SBI setting amounts toDraw samples $\theta_0 \sim p(\theta)$ from the prior, simulate $x \sim p(x\theta_0)$ from the likelihood, and obtain $\thetat \sim p{t|0}(\theta_t\mid \theta_0)$ using the forward noising process.Use these samples to train the time-varying score network, minimizing a MonteCarlo estimate of the denoising score matching objective.Generate samples from the approximate score-matching posterior$\bar{\theta}_T \sim p(\theta \mid x_o)$ by sampling $\bar{\theta}0 \sim\pi$ from the noise distribution and plugging $\nabla\theta \logp_t(\theta_t \mid xo) \approx s{\psi}(\theta_t, x_o, t)$ into thereverse-time process to obtain $\bar{\theta}_T$.The authors call their approach neural posterior score estimation (NPSE).In a similar vein, score-matching can be used to approximate the likelihood $p(x\mid \theta)$, resulting in neural likelihood score estimation (NLSE)(requiring additional sampling via MCMC or VI).Sequential neural score estimationNeural posterior estimation enables amortized inference: once trained, theconditional density estimator can be applied to various $x_o$ to obtaincorresponding posterior approximations with a single forward pass through thenetwork. In some scenarios, amortization is an excellent property. However, ifsimulators are computationally expensive and one is interested in only aparticular observation $x_o$, sequential SBI methods can help to explore theparameter space more efficiently, obtaining a better posterior approximationwith fewer simulations.The idea of sequential SBI methods is to extend the inference over multiplerounds: In the first round, training data comes from the prior. In thesubsequent rounds, a proposal distribution tailored to be informative about$x_o$ is used instead, e.g., the current posterior estimate. Because samples inthose rounds do not come from the prior, the resulting posterior will not be thedesired posterior but the proposal posterior. Several variants of sequentialneural posterior estimation have been proposed, each with its own strategy forcorrecting this mismatch to recover the actual posterior (see [Lue21B] for an overview).[Sha22S] present score-matching variants forboth sequential NPE (similar to the one proposed in [Gre19A]) and for sequential NLE.Empirical resultsFigure 1. [Sha22S], Figure 2. Posterior accuracy of various SBI methods onfour benchmarking tasks. Measured in two-sample classification test accuracy (C2ST, 0.5is best).The authors evaluate their approach on a set of four SBI benchmarking tasks [Lue21B]. They find that score-based methods for SBI perform onpar with and, in some cases, better than existing flow-based SBI methods (Figure1).LimitationsWith score-based diffusion models, this paper presented a potent conditional densityestimator for SBI. It demonstrated similar performance to existing SBI methods on asubset of benchmarking tasks, particularly when simulation budgets were low, such as inthe two moons task. However, the authors did not extend their evaluation to real-worldSBI problems, which are typically more high-dimensional and complex than thebenchmarking tasks.It is important to note that diffusion models can be more computationally intensiveduring inference time than existing methods. For instance, while normalizing flows canbe sampled and evaluated with a single forward pass through the neural network,diffusion models necessitate solving an SDE to obtain samples or log probabilities fromthe posterior. Therefore, akin to flow-matching methods for SBI, score-matching methodsrepresent promising new tools for SBI, but they imply a trade-off at inferencetime that will depend on the specific problem.