将 SHAP 与自定义 sklearn 估计器结合使用
Using SHAP with custom sklearn estimator
使用以下自定义估算器,如果提供了 sklearn pipeline
并在需要时进行目标转换:
class model_linear_regression(base_model_class):
def __init__(self, pipe=None, inverse=False):
self.name = 'Linear_Regression'
self.model = LinearRegression()
if pipe==None:
self.pipe = Pipeline([('model', self.model)])
else:
self.pipe = deepcopy(pipe)
self.pipe.steps.append(('model', self.model))
if inverse:
self.pipe = TransformedTargetRegressor( regressor=self.pipe,
func=np.log1p,
inverse_func=np.expm1)
def fit(self, X:pd.DataFrame=X_train, y:pd.Series=y_train):
self.pipe.fit(X, y)
return self
def predict(self, X:pd.DataFrame=X_test):
y_pred = self.pipe.predict(X)
return y_pred
将它与 SHAP 一起使用 returns 出现以下错误:
Typeerror: ufunc 'isfinite' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''
注意:
- 管道向估算器提供
np.ndarray
而不是 pd.DataFrame
示例:
def get_shap(model, X, y):
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=.3, random_state=42)
model.fit(train_X, train_y)
explainer = shap.Explainer(model.predict, test_X)
shap_values = explainer(test_X)
return shap_values
results = get_shap(model_linear_regression(pipe=LINEAR_PIPE, inverse=True), X, y)
如何让它工作?
好的...找到了可行的解决方案:
# Select model
shap_model = model_linear_regression(pipe=LINEAR_PIPE, inverse=True)
# fit model
model_fitted = shap_model.fit(X_train, y_train)
# Summarize model (not necessary but makes things faster
# If not being summarized replace all X_test_summary with X_test
X_test_summary = shap.sample(X_test, 10)
# Explain in iteration n=K in summary
explainer = shap.KernelExplainer(model_fitted.predict, X_test_summary, keep_index=True)
shap_values = explainer.shap_values(X_test_summary)
shap.summary_plot(shap_values, X_test_summary)
这也是我遇到的一些错误并找到了解决方案:
IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
AttributeError: 'Kernel' object has no attribute 'masker'
新问题:
- 现在的问题是,当结果为
np.ndarray
时,并非所有绘图都可用,因此需要找到解决该问题的方法。
使用以下自定义估算器,如果提供了 sklearn pipeline
并在需要时进行目标转换:
class model_linear_regression(base_model_class):
def __init__(self, pipe=None, inverse=False):
self.name = 'Linear_Regression'
self.model = LinearRegression()
if pipe==None:
self.pipe = Pipeline([('model', self.model)])
else:
self.pipe = deepcopy(pipe)
self.pipe.steps.append(('model', self.model))
if inverse:
self.pipe = TransformedTargetRegressor( regressor=self.pipe,
func=np.log1p,
inverse_func=np.expm1)
def fit(self, X:pd.DataFrame=X_train, y:pd.Series=y_train):
self.pipe.fit(X, y)
return self
def predict(self, X:pd.DataFrame=X_test):
y_pred = self.pipe.predict(X)
return y_pred
将它与 SHAP 一起使用 returns 出现以下错误:
Typeerror: ufunc 'isfinite' not supported for the input types, and the inputs could not be
safely coerced to any supported types according to the casting rule ''safe''
注意:
- 管道向估算器提供
np.ndarray
而不是pd.DataFrame
示例:
def get_shap(model, X, y):
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=.3, random_state=42)
model.fit(train_X, train_y)
explainer = shap.Explainer(model.predict, test_X)
shap_values = explainer(test_X)
return shap_values
results = get_shap(model_linear_regression(pipe=LINEAR_PIPE, inverse=True), X, y)
如何让它工作?
好的...找到了可行的解决方案:
# Select model
shap_model = model_linear_regression(pipe=LINEAR_PIPE, inverse=True)
# fit model
model_fitted = shap_model.fit(X_train, y_train)
# Summarize model (not necessary but makes things faster
# If not being summarized replace all X_test_summary with X_test
X_test_summary = shap.sample(X_test, 10)
# Explain in iteration n=K in summary
explainer = shap.KernelExplainer(model_fitted.predict, X_test_summary, keep_index=True)
shap_values = explainer.shap_values(X_test_summary)
shap.summary_plot(shap_values, X_test_summary)
这也是我遇到的一些错误并找到了解决方案:
IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
AttributeError: 'Kernel' object has no attribute 'masker'
新问题:
- 现在的问题是,当结果为
np.ndarray
时,并非所有绘图都可用,因此需要找到解决该问题的方法。