Force_plot 用于多类概率解释器
Force_plot for multiclass probability explainer
我遇到了有关 Python SHAP 库的错误。
虽然基于对数几率创建力图没有问题,但我无法基于概率创建力图。
目标是 base_values 和 shap_values 总和为预测概率。
这个有效:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xgboost as xgb
import sklearn
import shap
X, y = shap.datasets.iris()
X_display, y_display = shap.datasets.iris(display=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.2, random_state = 42)
#fit xgboost model
params = {
'objective': "multi:softprob",
'eval_metric': "mlogloss",
'num_class': 3
}
xgb_fit = xgb.train(
params = params
, dtrain = xgb.DMatrix(data = X_train, label = y_train)
)
#create shap values and perform tests
explainer = shap.TreeExplainer(xgb_fit)
shap_values = explainer.shap_values(X_train)
这行不通:
explainer = shap.TreeExplainer(
model = xgb_fit
, data = X_train
, feature_perturbation='interventional'
, model_output = 'probability'
)
已用包:
matplotlib 3.4.1
numpy 1.20.2
pandas 1.2.4
scikit-学习 0.24.1
shap 0.39.0
xgboost 1.4.1
要查看您的多类分类原始分数的概率 space,请尝试 KernelExplainer
:
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from shap import datasets, KernelExplainer, force_plot, initjs
from scipy.special import softmax, expit
initjs()
X, y = datasets.iris()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
clf = XGBClassifier(random_state=42,
eval_metric="mlogloss",
use_label_encoder=False)
clf.fit(X_train, y_train)
ke = KernelExplainer(clf.predict_proba, data=X_train)
shap_values = ke.shap_values(X_test)
force_plot(ke.expected_value[1], shap_values[1][0], feature_names=X.columns)
完整性检查:
- 预期结果(舍入误差):
clf.predict_proba(X_test[:1])
#array([[0.0031177 , 0.9867134 , 0.01016894]], dtype=float32)
- 基础值:
clf.predict_proba(X_train).mean(0)
#array([0.3339472 , 0.34133017, 0.32472247], dtype=float32)
(或者如果您愿意 np.unique(y_train, return_counts=True)[1]/len(y_train)
)
我遇到了有关 Python SHAP 库的错误。 虽然基于对数几率创建力图没有问题,但我无法基于概率创建力图。 目标是 base_values 和 shap_values 总和为预测概率。
这个有效:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xgboost as xgb
import sklearn
import shap
X, y = shap.datasets.iris()
X_display, y_display = shap.datasets.iris(display=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.2, random_state = 42)
#fit xgboost model
params = {
'objective': "multi:softprob",
'eval_metric': "mlogloss",
'num_class': 3
}
xgb_fit = xgb.train(
params = params
, dtrain = xgb.DMatrix(data = X_train, label = y_train)
)
#create shap values and perform tests
explainer = shap.TreeExplainer(xgb_fit)
shap_values = explainer.shap_values(X_train)
这行不通:
explainer = shap.TreeExplainer(
model = xgb_fit
, data = X_train
, feature_perturbation='interventional'
, model_output = 'probability'
)
已用包:
matplotlib 3.4.1
numpy 1.20.2
pandas 1.2.4
scikit-学习 0.24.1
shap 0.39.0
xgboost 1.4.1
要查看您的多类分类原始分数的概率 space,请尝试 KernelExplainer
:
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from shap import datasets, KernelExplainer, force_plot, initjs
from scipy.special import softmax, expit
initjs()
X, y = datasets.iris()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
clf = XGBClassifier(random_state=42,
eval_metric="mlogloss",
use_label_encoder=False)
clf.fit(X_train, y_train)
ke = KernelExplainer(clf.predict_proba, data=X_train)
shap_values = ke.shap_values(X_test)
force_plot(ke.expected_value[1], shap_values[1][0], feature_names=X.columns)
完整性检查:
- 预期结果(舍入误差):
clf.predict_proba(X_test[:1])
#array([[0.0031177 , 0.9867134 , 0.01016894]], dtype=float32)
- 基础值:
clf.predict_proba(X_train).mean(0)
#array([0.3339472 , 0.34133017, 0.32472247], dtype=float32)
(或者如果您愿意 np.unique(y_train, return_counts=True)[1]/len(y_train)
)