树 SHAP 加 XGBoost - 列表必须是整数或切片，而不是元组

Question

我一直在努力弄清楚如何使用 tree SHAP 来进一步评估我的 XGBoost 分类器。我运行遇到了一些数据问题，这一定是用户错误。我不明白输入是什么导致了这个...

输入：shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])

设置：

import shap
import numpy as np
import matplotlib.pylab as pl

# load JS visualization code to notebook
shap.initjs()

解释器配置和 shap_values:

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

生成显示：

shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])

错误:

这是 shap_values 世代的事情。我不明白我传递的 X（数据框）与他们在示例中使用波士顿数据集得到的 X（数据框）有什么问题。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-dd74c29cab4f> in <module>
----> 1 shap.force_plot(explainer.expected_value, shap_values[0,:], mean_data.iloc[0,:])

TypeError: list indices must be integers or slices, not tuple

输出：

print(explainer.expected_value)
print(shap_values) # Is a list as Robin Niel thought

[-0.84587, 1.0577996, 1.1045177]

[数组([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)]

来自自述文件：https://github.com/slundberg/shap/blob/master/README.md

import xgboost
import shap

# load JS visualization code to notebook
shap.initjs()

# train XGBoost model
X,y = shap.datasets.boston()
model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)

# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])

Answer 1

shap_values 是什么类型？它在文档中说：

For models with a single output this returns a matrix of SHAP values (# samples x # features). Each row sums to the difference between the model output for that sample and the expected value of the model output (which is stored in the expected_value attribute of the explainer when it is constant). For models with vector outputs this returns a list of such matrices, one for each output.

如果你是第二种情况，它可能是一个列表（所以 python 根据我的理解是原生的）因此不能像你正在做的那样使用 numpy 索引（shap_values[0 ,:]).如果是这样的话，我想你只需要做 shap_value[0]。让我知道这是否解决了您的问题。

树 SHAP 加 XGBoost - 列表必须是整数或切片，而不是元组

Tree SHAP plus XGBoost - list must be integers or slices, not tuple

python

pandas

xgboost