将 SHAP 与自定义 sklearn 估计器结合使用

Using SHAP with custom sklearn estimator

使用以下自定义估算器,如果提供了 sklearn pipeline 并在需要时进行目标转换:

class model_linear_regression(base_model_class):
    def __init__(self, pipe=None, inverse=False):
        self.name = 'Linear_Regression'
        self.model = LinearRegression()
        
        if pipe==None:
            self.pipe = Pipeline([('model', self.model)])
        else:
            self.pipe = deepcopy(pipe)
            self.pipe.steps.append(('model', self.model))

        if inverse:
            self.pipe = TransformedTargetRegressor( regressor=self.pipe,
                                                    func=np.log1p, 
                                                    inverse_func=np.expm1)
    def fit(self, X:pd.DataFrame=X_train, y:pd.Series=y_train):
        self.pipe.fit(X, y)
        return self
    def predict(self, X:pd.DataFrame=X_test):
        y_pred = self.pipe.predict(X)
        return y_pred

将它与 SHAP 一起使用 returns 出现以下错误:

Typeerror: ufunc 'isfinite' not supported for the input types, and the inputs could not be 
safely coerced to any supported types according to the casting rule ''safe''

注意:

示例:

def get_shap(model, X, y):
    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=.3, random_state=42)
    model.fit(train_X, train_y)
    explainer = shap.Explainer(model.predict, test_X)
    shap_values = explainer(test_X)
    return shap_values

results = get_shap(model_linear_regression(pipe=LINEAR_PIPE, inverse=True), X, y)

如何让它工作?

好的...找到了可行的解决方案:

# Select model
shap_model = model_linear_regression(pipe=LINEAR_PIPE, inverse=True)
# fit model
model_fitted = shap_model.fit(X_train, y_train)

# Summarize model (not necessary but makes things faster
# If not being summarized replace all X_test_summary with X_test
X_test_summary = shap.sample(X_test, 10)

# Explain in iteration n=K in summary
explainer = shap.KernelExplainer(model_fitted.predict, X_test_summary, keep_index=True)
shap_values = explainer.shap_values(X_test_summary)

shap.summary_plot(shap_values, X_test_summary)

这也是我遇到的一些错误并找到了解决方案:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

AttributeError: 'Kernel' object has no attribute 'masker'

新问题:

  • 现在的问题是,当结果为 np.ndarray 时,并非所有绘图都可用,因此需要找到解决该问题的方法。