SHAP 使用数据框中的索引值绘制瀑布图

Question

我正在使用随机森林算法进行二元分类

目前，我正在尝试使用 SHAP 值来解释模型预测。

所以，我参考了这个有用的 post 并尝试了下面的方法。

from shap import TreeExplainer, Explanation
from shap.plots import waterfall
sv = explainer(ord_test_t)
exp = Explanation(sv.values[:,:,1], 
                  sv.base_values[:,1], 
                  data=ord_test_t.values, 
                  feature_names=ord_test_t.columns)
idx = 20
waterfall(exp[idx])

我喜欢上面的方法，因为它允许显示特征值和瀑布图。所以，我想用这个方法

但是，这并不能帮助我获得 ord_test_t（测试数据）中特定行的瀑布。

例如，让我们考虑 ord_test_t.Index.tolist() returns 3,5,8,9 等...

现在，我想绘制 ord_test_t.iloc[[9]] 的瀑布图，但是当我通过 exp[9] 时，它只得到第 9 行，而不是名为 9 的索引。

当我尝试 exp.iloc[[9]] 时，它抛出错误 explanation object doesnt have iloc。

可以帮我解决这个问题吗？

Answer 1

我的建议如下：

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer, Explanation
from shap.plots import waterfall

import shap

print(shap.__version__)

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

idx = 9
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)
sv = explainer(X.loc[[idx]])    # corrected, pass the row of interest as df
exp = Explanation(
    sv.values[:, :, 1],         # class to explain
    sv.base_values[:, 1],
    data=X.loc[[idx]].values,   # corrected, pass the row of interest as df
    feature_names=X.columns,
)
waterfall(exp[0])               # pretend you have only 1 data point which is 0th

0.40.0

证明：

model.predict_proba(X.loc[[idx]]) # corrected

array([[0.95752656, 0.04247344]])

SHAP 使用数据框中的索引值绘制瀑布图

SHAP plotting waterfall using an index value in dataframe

python

classification

machine-learning

dataframe

shap