MarkTechPost@AI 19小时前
How to Use the SHAP-IQ Package to Uncover and Visualize Feature Interactions in Machine Learning Models Using Shapley Interaction Indices (SII)
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本教程介绍了如何使用SHAP-IQ包,基于 Shapley Interaction Indices (SII) 来发现和可视化机器学习模型中的特征交互作用。文章首先解释了Shapley值在解释模型中的作用,并指出其在捕捉特征交互方面的局限性。随后,通过Bike Sharing数据集,演示了如何安装依赖、加载和预处理数据、训练随机森林模型并进行性能评估。重点在于如何设置TabularExplainer,利用k-SII方法计算高达四阶的特征交互值,并分析了特定实例的局部解释。最后,通过绘制瀑布图展示了单个特征对模型预测的贡献,清晰地说明了如温度和年份等特征如何显著影响预测结果,为理解模型决策提供了有价值的洞察。

🔹 SHAP-IQ包通过Shapley Interaction Indices (SII) 扩展了传统的Shapley值,能够量化和可视化特征之间的交互作用,从而提供比单独的Shapley值更深入的模型解释。例如,它能揭示经度和纬度如何共同影响房价,这是单一Shapley值无法捕捉的。

🔹 教程以Bike Sharing数据集为例,详细演示了使用SHAP-IQ进行特征交互分析的完整流程。这包括安装必要的库(如shapiq、scikit-learn、pandas),加载和分割数据,训练一个RandomForestRegressor模型,并评估其性能(R²、MAE、RMSE)。

🔹 通过设置`TabularExplainer`并指定`max_order=4`,SHAP-IQ能够计算高达四阶的特征交互值,允许用户探索多个特征组合对模型预测的联合影响。对于局部实例的解释,`explainer.explain()`方法在指定预算内计算这些交互值。

🔹 文章还展示了如何计算和可视化“一阶交互值”,这实际上就是标准的Shapley值,仅反映单个特征的独立贡献。通过`plot_waterfall`函数,可以清晰地展示每个特征如何从基线预测值逐步影响最终的预测结果,直观地揭示了“天气”和“湿度”的正面影响以及“温度”和“年份”的负面影响。

In this tutorial, we explore how to use the SHAP-IQ package to uncover and visualize feature interactions in machine learning models using Shapley Interaction Indices (SII), building on the foundation of traditional Shapley values.

Shapley values are great for explaining individual feature contributions in AI models but fail to capture feature interactions. Shapley interactions go a step further by separating individual effects from interactions, offering deeper insights—like how longitude and latitude together influence house prices. In this tutorial, we’ll get started with the shapiq package to compute and explore these Shapley interactions for any model. Check out the Full Codes here

Installing the dependencies

!pip install shapiq overrides scikit-learn pandas numpy

Data Loading and Pre-processing

In this tutorial, we’ll use the Bike Sharing dataset from OpenML. After loading the data, we’ll split it into training and testing sets to prepare it for model training and evaluation. Check out the Full Codes here

import shapiqfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.metrics import mean_absolute_error, mean_squared_error, r2_scorefrom sklearn.model_selection import train_test_splitimport numpy as np# Load dataX, y = shapiq.load_bike_sharing(to_numpy=True)# Split into training and testingX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Training and Performance Evaluation

# Train modelmodel = RandomForestRegressor()model.fit(X_train, y_train)# Predicty_pred = model.predict(X_test)# Evaluatemae = mean_absolute_error(y_test, y_pred)rmse = np.sqrt(mean_squared_error(y_test, y_pred))r2 = r2_score(y_test, y_pred)print(f"R² Score: {r2:.4f}")print(f"Mean Absolute Error: {mae:.4f}")print(f"Root Mean Squared Error: {rmse:.4f}")

Setting up an Explainer

We set up a TabularExplainer using the shapiq package to compute Shapley interaction values based on the k-SII (k-order Shapley Interaction Index) method. By specifying max_order=4, we allow the explainer to consider interactions of up to 4 features simultaneously, enabling deeper insights into how groups of features collectively impact model predictions. Check out the Full Codes here

# set up an explainer with k-SII interaction values up to order 4explainer = shapiq.TabularExplainer(    model=model,    data=X,    index="k-SII",    max_order=4)

Explaining a Local Instance

We select a specific test instance (index 100) to generate local explanations. The code prints the true and predicted values for this instance, followed by a breakdown of its feature values. This helps us understand the exact inputs passed to the model and sets the context for interpreting the Shapley interaction explanations that follow. Check out the Full Codes here

from tqdm.asyncio import tqdm# create explanations for different ordersfeature_names = list(df[0].columns)  # get the feature namesn_features = len(feature_names)# select a local instance to be explainedinstance_id = 100x_explain = X_test[instance_id]y_true = y_test[instance_id]y_pred = model.predict(x_explain.reshape(1, -1))[0]print(f"Instance {instance_id}, True Value: {y_true}, Predicted Value: {y_pred}")for i, feature in enumerate(feature_names):    print(f"{feature}: {x_explain[i]}")

Analyzing Interaction Values

We use the explainer.explain() method to compute Shapley interaction values for a specific data instance (X[100]) with a budget of 256 model evaluations. This returns an InteractionValues object, which captures how individual features and their combinations influence the model’s output. The max_order=4 means we consider interactions involving up to 4 features. Check out the Full Codes here

interaction_values = explainer.explain(X[100], budget=256)# analyse interaction valuesprint(interaction_values)

First-Order Interaction Values

To keep things simple, we compute first-order interaction values—i.e., standard Shapley values that capture only individual feature contributions (no interactions).

By setting max_order=1 in the TreeExplainer, we’re saying:

“Tell me how much each feature individually contributes to the prediction, without considering any interaction effects.”

These values are known as standard Shapley values. For each feature, it estimates the average marginal contribution to the prediction across all possible permutations of feature inclusion. Check out the Full Codes here

feature_names = list(df[0].columns)explainer = shapiq.TreeExplainer(model=model, max_order=1, index="SV")si_order = explainer.explain(x=x_explain)si_order

Plotting a Waterfall chart

A Waterfall chart visually breaks down a model’s prediction into individual feature contributions. It starts from the baseline prediction and adds/subtracts each feature’s Shapley value to reach the final predicted output.

In our case, we’ll use the output of TreeExplainer with max_order=1 (i.e., individual contributions only) to visualize the contribution of each feature. Check out the Full Codes here

si_order.plot_waterfall(feature_names=feature_names, show=True)

In our case, the baseline value (i.e., the model’s expected output without any feature information) is 190.717.

As we add the contributions from individual features (order-1 Shapley values), we can observe how each one pushes the prediction up or pulls it down:

Overall, the Waterfall chart helps us understand which features are driving the prediction, and in which direction—providing valuable insight into the model’s decision-making.


Check out the Full Codes here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post How to Use the SHAP-IQ Package to Uncover and Visualize Feature Interactions in Machine Learning Models Using Shapley Interaction Indices (SII) appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SHAP-IQ Shapley Interaction Indices 机器学习 特征交互 模型解释
相关文章