Variance Reduced Shapley Value Estimation for Trustworthy Data Valuation

Content feed of the TransferLab — appliedAI Institute 2024年11月27日

Variance Reduced Shapley Value Estimation for Trustworthy Data Valuation

VRDS是一种有效降低数据Shapley方差的方法，它基于分层抽样技术并预先计算系数。在之前的研究中，Beta-Shapley和Data Banzhaf等数据估值方法被提出用于解决效用评估中的噪声问题。本文则关注于通过解决方差问题来提高数据估值结果的质量，采用了类似于先前文献中方法的分层抽样技术，并将其命名为VRDS（Variance Reduce Data Shapley）。VRDS从Shapley值的组合公式出发，将联盟根据其规模划分为不同的层，从而在计算边际贡献时提高效率和准确性。

🤔VRDS方法旨在降低数据Shapley值的方差，从而提高数据估值结果的稳健性，避免噪声干扰。

🚀VRDS的核心思想是利用分层抽样技术，将所有可能的联盟根据其规模划分为不同的层，并在每个层中进行独立的抽样。

📊VRDS方法预先计算了每个层对应的系数，从而可以更有效地估计Shapley值，减少计算量。

💡VRDS方法借鉴了先前文献中关于分层抽样的研究成果，并将其应用于数据Shapley值的计算。

🧮VRDS的计算公式基于Shapley值的组合公式，通过将联盟划分为不同的层，可以更有效地计算边际贡献。

VRDS, an effective method to reduce the variance of Data Shapley, based on a stratified sampling technique with precomputed coefficients.In two of our previous paper pills about Data Valuation methods, namely Beta-Shapley and Data Banzhaf, we discussed how to make thecomputed values more robust to noise in the utility evaluation.Whereas [Kwo22B] and [Wan23D] addressthe noise in stochastic utilities, in the paper we study here, [Wu23V], the authors improve the quality of the values bytackling a different problem. They reduce the variance of the Monte Carloestimate using stratified sampling, similarly to previous work in [Mal14B] and [Cas17I]. They call theirmethod Variance Reduce Data Shapley (VRDS).Starting with the combinatorial formulation of Shapley values, one can dividethe coalitions into strata according to their size when calculating the marginalcontributions:$$\begin{align}v(xi)&= \frac{1}{n} \sum{S \subseteq D \setminus{x_i}, |S| = k} \binom{n-1}{k}^{-1} \left[u(S \cup {xi}) − u(S)\right]\&= \frac{1}{n} \sum{k=0}^{n-1} \binom{n-1}{k}^{-1} \sum_{S \subseteq D \setminus{x_i}, |S| = k} \left[u(S \cup {x_i}) − u(S)\right].\end{align}$$Figure 1: Example of stratified samplingfor the set ${1, 2, 3, 4}.$Grouping by size the coalitions which do not contain the data point $i,$we have $n$ strata $S_0, S1, \dots , S{n−1}$ such that$S_k = {S \subseteq N \ {i}, |S| = k}$ contains all the coalitions with size $k.$The marginal contribution of data point $i$ within stratum $S_k$ is:$$v^k(xi) = \binom{n-1}{k}^{-1} \sum{S \subseteq D \setminus{x_i}, |S| = k} \left[u(S \cup {x_i}) − u(S)\right],$$and with it, the Shapley value of point $i$ can be written as:$$v(xi) = \frac{1}{n} \sum{k=0}^{n-1} v^k(x_i).$$A standard Monte Carlo approximation (to reduce the number of evaluations of $u$) isthen:$$\hat{v}(xi)= \frac{1}{n} \sum{k=0}^{n-1} \hat{v}^k(xi)= \frac{1}{n} \sum{k=0}^{n-1} \frac{1}{mk} \sum{j=1}^{m_k}\left[u(S^k_j \cup {x_i}) − u(S^k_j)\right],$$where $S^k_j$ is the $j$th sample coalition in stratum $k$ and $m_k$is the number of samples in the stratum. Grouping coalitions by size intostrata allows us to optimise the number of samples taken from each stratum.In Theorem 4.1 of [Wu23V], the authors show that thechoice of $mk$ which optimally reduces variance for a given number of samples$m$ is proportional to $\sigma{i,k},$ the standard deviation of$\hat{v}^k(x_i)$:$$\tilde{m}k = m\frac{\sigma{i,k}}{\sum{j=0}^{n−1} \sigma{i,j}},\quad k = 0, \dots , n − 1.$$Because the variance of the estimators $\hat{v}^k(xi)$ is unknown, the authorssuggest substituting a general $f(k)$ for $\sigma{i,k}$ and propose a fewdifferent possibilities. They pick$$f(k) = (k+1)^a, \quad a < 0,$$a choice that, when plugged into the formula above, is a generalization of theoptimal value obtained in [Mal14B]. 11 Contrary to VRDS, this is derived from aHoeffding bound on the quality of the estimator, after solving a simpleminimization problem, leading to a specific choice ofexponent $a =-\frac{2}{3},$ cf [Mal14B] Eq. (14).Taking into account that sample size should be an integer, we can set the valueof $m_k = \min(1, \lfloor \tilde{m}k \rfloor).$ However, this implies thatsome samples may be left unused as $\sum{k=0}^{n-1} m_k \le m.$ In that case,the authors sequentially increase the value of $m_k$ from $k = 0$ to $n − 1$until the sum exceeds $m.$We note that previously, [Cas17I] used estimatedstandard deviations $\tilde{\sigma}{i,k}$ instead:$$\tilde{m}{i,k} = m \frac{\tilde{\sigma}{i,k}^2}{\sum{j=0}^{n-1} \sum{l=0}^{n-1} \tilde{\sigma}{j,l}^2},\quad k = 0, \dots , n − 1.$$Sadly, all of these approaches assume that the number of samplesis known beforehand which is almost never the case in practice.To empirically demonstrate that the stratified sampling algorithmfor VRDS can estimate the exact data Shapley values with a variance smallerthan that obtained by the permutation sampling algorithm,the authors then run a few different experiments on a plethora of datasets:Comparison of variance of VRDS and Permutation sampling:Figure 2: Data Shapley valuesof 20 randomly selected data points from the FashionMNIST datasetand their estimates. The experiment is repeated 30 times to estimatevalues.From Figure 2, we can clearly see that VRDS’svariance is lower than Permutation sampling. However, in many data valuationapplications we are more interested in the order or ranking thanin the actual value, and it is not clear whether the rank of VRDS valuesremains truly stable across experiments.The authors should have added an experiment to evaluate the ranking stabilityof VRDS, similar to what has been done inData Banzhaf [Wan23D].Performance of VRDS using different exponents $a:$Figure 3: Variance of VRDS when $a$ takesdifferent values in the range $[−3, 3].$ The dataset is FashionMNIST,the models are KNN and LR and the data sizes are 100 and 30, respectively.Figure 3 shows that VRDS consistently achieves lower variancethan permutation sampling, and that $a=-1$ (black, bottom line) seems to yieldthe best result. It would have been interesting to see how the choice in[Mal14B] of $a=-\frac{2}{3}$ would have performed.Removal of best and worst data groups:Figure 4: Data group removal experiment on theFashionMNIST data set. There are 20 groups and each group has 100 data points.VRDS ($a = −\frac{1}{2}$) and permutation sampling algorithms are used tocalculate the value of data groups. (a) shows the removal of the most valuabledata groups according to the estimated data Shapley values. (b) shows theremoval of the most valuable data groups where the estimated data Shapley values$v_i$ have been shifted by the variances $\sigma^2_i$ with $−100 \sigma^2_i.$The plots in Figure 4 don’t clearly show that VRDS performsbetter than permutation sampling for point removal tasks, and, crucially, lackconfidence intervals. The similar performance could be due to the fact that eventhough VRDS’ variance is lower it doesn’t necessarily converge to correctvalues, which may indicate that the estimation is biased. It would also havebeen beneficial to compare it to other methods such as Data Banzhaf.Alas, only the first experiment includes confidence intervals. Without moredetails and a study of the quality of the approximation, possibly using rankstability statistics, it is hard to decide on the interest of VRDS in practice.The experiments in general lack details and no code was provided to reproducetheir results.However, the use of stratified sampling seems like a worthwhile additionto the data Shapley methods as the authors wrote in their conclusion: it wouldbe interesting to apply it to other data valuations methods that relyon sampling to estimate their values (e.g. Beta Shapley, Data Banzhaf,Truncated Monte Carlo Shapley).If you are interested in testing out data valuation algorithms considerusing the TransferLab’s pyDVL package for datavaluation. To try it out, install with pip install pydvl andread the documentation.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

VRDS 数据Shapley 分层抽样方差缩减数据估值

相关文章

Attributing Data for Sharpness-Aware Minimization