Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

Content feed of the TransferLab — appliedAI Institute 2024年11月27日

Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

Sourcerer 提出了一种新方法，用于估计复杂科学和工业问题中随机模拟器的源分布。传统方法如经验贝叶斯在似然函数不可获得或参数估计问题不适定时效果不佳。Sourcerer 通过最大化源分布的熵，并约束其推前分布与经验分布一致，来解决这一问题。该方法利用切片 Wasserstein 距离衡量推前分布与经验分布之间的差异，并使用无约束神经网络近似源分布。此外，Sourcerer 还考虑了贝叶斯先验信息，通过最小化估计源分布与先验分布之间的 KL 散度来正则化解。实验结果表明，Sourcerer 在多个模拟器上都取得了良好的性能，证明了该方法的有效性和适用性。

🤔 **问题背景：**在复杂科学和工业问题中，利用数学和计算模拟器建模时，确定导致特定实验观察结果的参数是一个普遍的难题。

💡 **Sourcerer 方法：**该方法通过最大化源分布的熵，并约束其推前分布与经验分布一致，来估计随机模拟器的源分布。这种方法尤其适用于似然函数不可获得或参数估计问题不适定情况。

📏 **切片 Wasserstein 距离：**Sourcerer 利用切片 Wasserstein 距离作为推前分布与经验分布之间差异的度量，因为它基于样本且避免了直接计算似然函数。

🧠 **神经网络近似：**为了近似源分布，Sourcerer 使用无约束人工神经网络，这种方法灵活且高效。

📊 **贝叶斯先验：**Sourcerer 允许引入贝叶斯先验信息，通过最小化估计源分布与先验分布之间的 KL 散度来正则化解，确保结果与先验知识相符。

Mathematical and computational simulators are instrumental in modeling complexscientific and industrial problems. A prevalent difficulty across domains isidentifying parameters that lead to certain experimental observations.We will consider a stochastic simulator $f(\theta)=x,$ capable of producingobservations $x$ from a set of parameters $\theta.$ This simulator enables thegeneration of samples from the likelihood $p(x \mid \theta),$ which is typicallyeither intractable or unavailable.Evaluating this simulator on various parameter configurations yields a dataset$\mathcal{D} = {x_1, \dots, x_n}$ and a corresponding empiricaldistribution $p(x_o).$ With Sourcerer, [Vet24S]introduces a strategy to determine a distribution $q(\theta)$ that, whenprocessed through the simulator, results in the empirical push-forwarddistribution $q^{#}(x)$ given by:$$q^{#}(x) = \int_{\Theta} p(x \mid \theta)q(\theta)d\theta$$A well-known tactic for estimating the source distribution is empiricalBayes, which refines the parameters $\phi$ of the prior by maximizing themarginal likelihood:$$p(\mathcal{D}) = \prod_i \int p(xi \mid \theta)q{\phi}(\theta) d\theta$$However, this approach is inadequate for simulators where the likelihood isunavailable or where the problem of parameter estimation is ill-posed.Maximum Entropy Source Distribution EstimationThe authors choose between competing source distributions taking the one thatmaximizes entropy. Intuitively this it the one that embodies the greatest levelof ignorance: The entropy $H(p)$ of a distribution $p$ is defined as $H(p) =-\int p(\theta)\ \log p(\theta)\ d\theta.$ To identify the maximum entropysource distribution, [Vet24S] propose to maximize$H(q),$ subject to the constraint that the push-forward distribution $\intp(x\mid \theta) q(\theta) d\theta$ equals the empirical distribution $p(x_o).$Figure 1. [Vet24S],Figure 2. The Sourcer framework optimizes $\phi$ of a parametric distribution$q_{\phi}(\theta)$ in order to maximize its entropy while yielding apush-forward distribution similar to the observed distribution w.r.t.Sliced-Wasserstein distance.As shown by the authors, optimizing the entropy of $q(\theta)$ yields a uniquesource distribution, if it exists. To implement it, they relax the functionalequality constraint with a penalty term, leading to the unconstrained problem$$\max \left { \lambda H(q) - (1-\lambda) \log(D(q^{#}, p_o)^2) \right },$$where $D(q^{#}, po)$ measures the discrepancy between the push-forward andthe empirical distributions, and $\lambda$ controls the penalty strength. Theauthors propose to use the Sliced-Wasserstein distance, as it is sample-basedand circumvents the direct evaluation of the likelihood. The logarithmic termenhances numerical stability.Incorporating the Bayesian perspective, with prior information about the source,the authors substitute the entropy term $H(q)$ with the Kullback-Leibler (KL)divergence between the estimated source distribution $q(\theta)$ and the initialprior $p(\theta).$ Doing so can be seen as regularizing the solution to stayclose to the prior. Since the KL divergence includes the entropy $H(q)$ andthe cross entropy $H(q,p),$ this new formulation remains amenable tosample-based estimation and the initial intention to include a large portion ofthe parameter space. The optimization thus becomes a balance between the KLdivergence and the discrepancy measure:\begin{align}& \lambda D{\text{KL}}(q \Vert p) + (1 - \lambda) \log \left( D(q^{#},p_o)^2\right) \= - & \lambda H(q) + \lambda H(q,p) + (1 - \lambda) \log \left( D(q^{#},p_o)^2\right).\end{align}In the second line, the KL-divergence is expressed in terms of entropy andcross-entropy between the source and the prior distribution.To approximate the source distribution $q(\theta),$ the authors utilizeunconstrained artificial neural networks, as presented by [Van21N].Numerical ExperimentsFigure 3. [Vet24S],Figure 4. Comparison of thetrue and estimated source distribution (left) and the observed data against thepush-forward distribution (right).The authors validate their method through detailed numerical examples. First,they benchmark their approach against the two moons, the inverse kinematics(IK), and the simple likelihood complex posterior task (SCLP). All threepresented by [Van21N] specifically for empirical Bayes.They further demonstrate their algorithm’s effectiveness on complex scenariosusing differentiable simulators, specifically the Lotka-Volterra and SIR models,showcasing the method’s adaptability and strength in diverse simulationcontexts. Finally, the authors apply Sourcer to the Hodgkin Huxley model, awell-known neuron model, to estimate the source distribution of the model’sparameters. In all cases, they use the Classifier-2-Sample-Test to evaluate thequality of push-forward distributions, obtained using the estimated source(Figure 2).The numerical experiments showcase the method’s ability to accurately estimatea source distribution, that yields a push-forward distribution very close to theobserved data. Furthermore, the source distribution show a greater level ofentropy, as desired (Figure 2).Figure 2. [Vet24S],Figure 3. The choice of $\lambda$ is an importanthyperparameter. The plots show the amount of entropy and the quality of thepush-forward distribution w.r.t. the choice of $\lambda.$

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

源分布估计最大熵随机模拟器切片Wasserstein距离神经网络

相关文章

What is a long context window?

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Are Large Language Models a Path to AGI? with Ben Goertzel - #625

The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

Learning to Ponder: Memory in Deep Neural Networks with Andrea Banino - #528

Deep Learning is Eating 5G. Here’s How, w/ Joseph Soriaga - #525

Vector Quantization for NN Compression with Julieta Martinez - #498

Skip-Convolutions for Efficient Video Processing with Amir Habibian - #496

Natural Graph Networks with Taco Cohen - #440

Neural Ordinary Differential Equations with David Duvenaud - #364