树 SHAP 加 XGBoost - 列表必须是整数或切片,而不是元组
Tree SHAP plus XGBoost - list must be integers or slices, not tuple
我一直在努力弄清楚如何使用 tree SHAP 来进一步评估我的 XGBoost 分类器。我 运行 遇到了一些数据问题,这一定是用户错误。我不明白输入是什么导致了这个...
输入:shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
设置:
import shap
import numpy as np
import matplotlib.pylab as pl
# load JS visualization code to notebook
shap.initjs()
解释器配置和 shap_values:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
生成显示:
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
错误:
这是 shap_values 世代的事情。我不明白我传递的 X(数据框)与他们在示例中使用波士顿数据集得到的 X(数据框)有什么问题。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-dd74c29cab4f> in <module>
----> 1 shap.force_plot(explainer.expected_value, shap_values[0,:], mean_data.iloc[0,:])
TypeError: list indices must be integers or slices, not tuple
输出:
print(explainer.expected_value)
print(shap_values) # Is a list as Robin Niel thought
[-0.84587, 1.0577996, 1.1045177]
[数组([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)]
来自自述文件:https://github.com/slundberg/shap/blob/master/README.md
import xgboost
import shap
# load JS visualization code to notebook
shap.initjs()
# train XGBoost model
X,y = shap.datasets.boston()
model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
shap_values 是什么类型?它在文档中说:
For models with a single output this returns a matrix of SHAP values (# samples x # features). Each row sums to the difference between the model output for that sample and the expected value of the model output (which is stored in the expected_value attribute of the explainer when it is constant). For models with vector outputs this returns a list of such matrices, one for each output.
如果你是第二种情况,它可能是一个列表(所以 python 根据我的理解是原生的)因此不能像你正在做的那样使用 numpy 索引(shap_values[0 ,:]).如果是这样的话,我想你只需要做 shap_value[0]。让我知道这是否解决了您的问题。
我一直在努力弄清楚如何使用 tree SHAP 来进一步评估我的 XGBoost 分类器。我 运行 遇到了一些数据问题,这一定是用户错误。我不明白输入是什么导致了这个...
输入:shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
设置:
import shap
import numpy as np
import matplotlib.pylab as pl
# load JS visualization code to notebook
shap.initjs()
解释器配置和 shap_values:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
生成显示:
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
错误:
这是 shap_values 世代的事情。我不明白我传递的 X(数据框)与他们在示例中使用波士顿数据集得到的 X(数据框)有什么问题。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-dd74c29cab4f> in <module>
----> 1 shap.force_plot(explainer.expected_value, shap_values[0,:], mean_data.iloc[0,:])
TypeError: list indices must be integers or slices, not tuple
输出:
print(explainer.expected_value)
print(shap_values) # Is a list as Robin Niel thought
[-0.84587, 1.0577996, 1.1045177]
[数组([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)]
来自自述文件:https://github.com/slundberg/shap/blob/master/README.md
import xgboost
import shap
# load JS visualization code to notebook
shap.initjs()
# train XGBoost model
X,y = shap.datasets.boston()
model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])
shap_values 是什么类型?它在文档中说:
For models with a single output this returns a matrix of SHAP values (# samples x # features). Each row sums to the difference between the model output for that sample and the expected value of the model output (which is stored in the expected_value attribute of the explainer when it is constant). For models with vector outputs this returns a list of such matrices, one for each output.
如果你是第二种情况,它可能是一个列表(所以 python 根据我的理解是原生的)因此不能像你正在做的那样使用 numpy 索引(shap_values[0 ,:]).如果是这样的话,我想你只需要做 shap_value[0]。让我知道这是否解决了您的问题。