将 SHAP 与自定义 sklearn 估计器结合使用

Question

使用以下自定义估算器，如果提供了 sklearn pipeline 并在需要时进行目标转换：

class model_linear_regression(base_model_class):
    def __init__(self, pipe=None, inverse=False):
        self.name = 'Linear_Regression'
        self.model = LinearRegression()
        
        if pipe==None:
            self.pipe = Pipeline([('model', self.model)])
        else:
            self.pipe = deepcopy(pipe)
            self.pipe.steps.append(('model', self.model))

        if inverse:
            self.pipe = TransformedTargetRegressor( regressor=self.pipe,
                                                    func=np.log1p, 
                                                    inverse_func=np.expm1)
    def fit(self, X:pd.DataFrame=X_train, y:pd.Series=y_train):
        self.pipe.fit(X, y)
        return self
    def predict(self, X:pd.DataFrame=X_test):
        y_pred = self.pipe.predict(X)
        return y_pred

将它与 SHAP 一起使用 returns 出现以下错误：

Typeerror: ufunc 'isfinite' not supported for the input types, and the inputs could not be 
safely coerced to any supported types according to the casting rule ''safe''

注意：

管道向估算器提供 np.ndarray 而不是 pd.DataFrame

示例：

def get_shap(model, X, y):
    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=.3, random_state=42)
    model.fit(train_X, train_y)
    explainer = shap.Explainer(model.predict, test_X)
    shap_values = explainer(test_X)
    return shap_values

results = get_shap(model_linear_regression(pipe=LINEAR_PIPE, inverse=True), X, y)

如何让它工作？

Answer 1

好的...找到了可行的解决方案：

# Select model
shap_model = model_linear_regression(pipe=LINEAR_PIPE, inverse=True)
# fit model
model_fitted = shap_model.fit(X_train, y_train)

# Summarize model (not necessary but makes things faster
# If not being summarized replace all X_test_summary with X_test
X_test_summary = shap.sample(X_test, 10)

# Explain in iteration n=K in summary
explainer = shap.KernelExplainer(model_fitted.predict, X_test_summary, keep_index=True)
shap_values = explainer.shap_values(X_test_summary)

shap.summary_plot(shap_values, X_test_summary)

这也是我遇到的一些错误并找到了解决方案：

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

https://github.com/slundberg/shap/issues/1357

AttributeError: 'Kernel' object has no attribute 'masker'

https://github.com/slundberg/shap/issues/1375

新问题：

现在的问题是，当结果为 np.ndarray 时，并非所有绘图都可用，因此需要找到解决该问题的方法。

将 SHAP 与自定义 sklearn 估计器结合使用

Using SHAP with custom sklearn estimator

python

machine-learning

scikit-learn

shap