How to safely discard features based on aggregate SHAP values

cs.AI updates on arXiv.org 04月01日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本文探讨了SHAP值在评估全局特征重要性时的可靠性。SHAP是一种常用的局部特征归因方法，但研究发现，仅基于平均SHAP值来判断特征是否重要可能存在误导。即使某个特征的SHAP值为0，函数仍可能依赖于该特征。为了解决这个问题，研究者建议在扩展支持集上聚合SHAP值，并证明了在这种情况下，小的聚合SHAP值可以安全地移除相应特征。研究结果对SHAP和KernelSHAP的理论和实践应用都具有重要意义。

💡 SHAP（Shapley Additive Explanations）是一种流行的局部特征归因方法，用于量化每个特征对函数输出的贡献。

⚠️ 现有实践中，人们常通过平均SHAP值的绝对值来评估全局特征重要性，进而移除不重要的特征，但这种方法可能存在问题。

🧐 研究发现，即使某个特征的SHAP值在整个数据集上都为0，函数也可能依赖于该特征，原因是SHAP值的计算涉及在数据支持集之外评估函数。

✅ 为了解决这个问题，本文提出了在扩展支持集上聚合SHAP值。扩展支持集是底层分布的边缘分布的乘积。

🔑 研究表明，在扩展支持集上，小的聚合SHAP值可以安全地移除对应的特征。

➕ 研究结果也扩展到了KernelSHAP，并证明了如果KernelSHAP在扩展分布上计算，小的聚合值也支持特征移除。

💡 本文还介绍了Shapley Lie代数，为SHAP的研究提供了代数见解，并证明了随机置换数据矩阵的每一列可以基于聚合SHAP和KernelSHAP值安全地移除特征。

arXiv:2503.23111v1 Announce Type: cross Abstract: SHAP is one of the most popular local feature-attribution methods. Given a function f and an input x, it quantifies each feature's contribution to f(x). Recently, SHAP has been increasingly used for global insights: practitioners average the absolute SHAP values over many data points to compute global feature importance scores, which are then used to discard unimportant features. In this work, we investigate the soundness of this practice by asking whether small aggregate SHAP values necessarily imply that the corresponding feature does not affect the function. Unfortunately, the answer is no: even if the i-th SHAP value is 0 on the entire data support, there exist functions that clearly depend on Feature i. The issue is that computing SHAP values involves evaluating f on points outside of the data support, where f can be strategically designed to mask its dependence on Feature i. To address this, we propose to aggregate SHAP values over the extended support, which is the product of the marginals of the underlying distribution. With this modification, we show that a small aggregate SHAP value implies that we can safely discard the corresponding feature. We then extend our results to KernelSHAP, the most popular method to approximate SHAP values in practice. We show that if KernelSHAP is computed over the extended distribution, a small aggregate value justifies feature removal. This result holds independently of whether KernelSHAP accurately approximates true SHAP values, making it one of the first theoretical results to characterize the KernelSHAP algorithm itself. Our findings have both theoretical and practical implications. We introduce the Shapley Lie algebra, which offers algebraic insights that may enable a deeper investigation of SHAP and we show that randomly permuting each column of the data matrix enables safely discarding features based on aggregate SHAP and KernelSHAP values.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签