SHAP:将 shap 值从 KernelExplainer 导出到 pandas 数据帧
SHAP: exporting shap values from KernelExplainer to pandas dataframe
我正在研究二元分类并使用 kernelExplainer
来解释我的模型(逻辑回归)的结果。
我的代码如下
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.30, random_state=42)
lr = LogisticRegression() # fit and predict statements not shown
masker = Independent(X_train, max_samples=100)
explainer = KernelExplainer(lr.predict,X_train)
bv = explainer.expected_value
sv = explainer.shap_values(X_train)
sdf_train = pd.DataFrame({
'row_id': X_train.index.values.repeat(X_train.shape[1]),
'feature': X_train.columns.to_list() * X_train.shape[0],
'feature_value': X_train.values.flatten(),
'base_value': bv,
'shap_values': sv.values[:,:,1].flatten() #error here I guess
})
但我首先遇到了以下错误。所以,我将最后一行更新为 'shap_values': pd.DataFrame(sv).values[:,1].flatten()
但我得到了下面显示的第二个错误
numpy.ndarray has no attribute values
ValueError: All arrays must be of the same length
关于数据类型,我的 X_train
是数据框,sv
是 numpy.ndarray
我希望我的输出如下所示(忽略基值的变化。它应该是恒定的)。但是输出结构如下
执行以下操作:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from shap import KernelExplainer
from shap import sample
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.30, random_state=42)
lr = LogisticRegression(max_iter=10000).fit(X_train, y_train)
background = sample(X_train, 100)
explainer = KernelExplainer(lr.predict, background)
sv = explainer.shap_values(X_train)
bv = explainer.expected_value
注意 sv 的形状:
sv.shape
(398, 30)
这意味着:
sdf_train = pd.DataFrame({
'row_id': X_train.index.values.repeat(X_train.shape[1]),
'feature': X_train.columns.to_list() * X_train.shape[0],
'feature_value': X_train.values.flatten(),
'base_value': bv,
'shap_values': sv.flatten() #error here I guess
})
sdf_train
row_id feature feature_value base_value shap_values
0 149 mean radius 13.74000 0.67 0.000000
1 149 mean texture 17.91000 0.67 -0.014988
2 149 mean perimeter 88.12000 0.67 0.060759
3 149 mean area 585.00000 0.67 0.028677
我正在研究二元分类并使用 kernelExplainer
来解释我的模型(逻辑回归)的结果。
我的代码如下
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.30, random_state=42)
lr = LogisticRegression() # fit and predict statements not shown
masker = Independent(X_train, max_samples=100)
explainer = KernelExplainer(lr.predict,X_train)
bv = explainer.expected_value
sv = explainer.shap_values(X_train)
sdf_train = pd.DataFrame({
'row_id': X_train.index.values.repeat(X_train.shape[1]),
'feature': X_train.columns.to_list() * X_train.shape[0],
'feature_value': X_train.values.flatten(),
'base_value': bv,
'shap_values': sv.values[:,:,1].flatten() #error here I guess
})
但我首先遇到了以下错误。所以,我将最后一行更新为 'shap_values': pd.DataFrame(sv).values[:,1].flatten()
但我得到了下面显示的第二个错误
numpy.ndarray has no attribute values
ValueError: All arrays must be of the same length
关于数据类型,我的 X_train
是数据框,sv
是 numpy.ndarray
我希望我的输出如下所示(忽略基值的变化。它应该是恒定的)。但是输出结构如下
执行以下操作:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from shap import KernelExplainer
from shap import sample
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.30, random_state=42)
lr = LogisticRegression(max_iter=10000).fit(X_train, y_train)
background = sample(X_train, 100)
explainer = KernelExplainer(lr.predict, background)
sv = explainer.shap_values(X_train)
bv = explainer.expected_value
注意 sv 的形状:
sv.shape
(398, 30)
这意味着:
sdf_train = pd.DataFrame({
'row_id': X_train.index.values.repeat(X_train.shape[1]),
'feature': X_train.columns.to_list() * X_train.shape[0],
'feature_value': X_train.values.flatten(),
'base_value': bv,
'shap_values': sv.flatten() #error here I guess
})
sdf_train
row_id feature feature_value base_value shap_values
0 149 mean radius 13.74000 0.67 0.000000
1 149 mean texture 17.91000 0.67 -0.014988
2 149 mean perimeter 88.12000 0.67 0.060759
3 149 mean area 585.00000 0.67 0.028677