查看单个预测的模型系数
Viewing model coefficients for a single prediction
我使用以下方法在 scikit-learn 管道中建立了一个逻辑回归模型:
pipeline = make_pipeline(
StandardScaler(),
LogisticRegressionCV(
solver='lbfgs',
cv=10,
scoring='roc_auc',
class_weight='balanced'
)
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
我可以使用此代码查看模型的整体预测系数...
# Look at model's coefficients to see what features are most important
plt.rcParams['figure.dpi'] = 50
model = pipeline.named_steps['logisticregressioncv']
coefficients = pd.Series(model.coef_[0], X_train.columns)
plt.figure(figsize=(10,12))
coefficients.sort_values().plot.barh(color='grey');
其中 returns 特征及其系数的条形图。
我试图做的是能够看到单个观察的不同输入值如何影响其预测。这个想法是能够 运行 对样本人群进行预测并检查具有“低”预测的组......例如,如果我 运行 预测 10 个观察值,我想看看如何不同的输入值分别影响这 10 个预测中的每一个。
记得我可以通过 Shap Values 使用以下内容实现此目的(但使用 LinearExplainer
而不是 TreeExplainer
):
# Instantiate model and encoder outside of pipeline for
# use with shap
model = RandomForestClassifier( random_state=25)
# Fit on train, score on val
model.fit(X_train_encoded, y_train2)
y_pred_shap = model.predict(X_val_encoded)
# Get an individual observation to explain.
row = X_test_encoded.iloc[[-3]]
# Why did the model predict this?
# Look at a Shapley Values Force Plot
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(row)
shap.initjs()
shap.force_plot(
base_value=explainer.expected_value[1],
shap_values=shap_values[1],
features=row
)```
我使用以下方法在 scikit-learn 管道中建立了一个逻辑回归模型:
pipeline = make_pipeline(
StandardScaler(),
LogisticRegressionCV(
solver='lbfgs',
cv=10,
scoring='roc_auc',
class_weight='balanced'
)
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
我可以使用此代码查看模型的整体预测系数...
# Look at model's coefficients to see what features are most important
plt.rcParams['figure.dpi'] = 50
model = pipeline.named_steps['logisticregressioncv']
coefficients = pd.Series(model.coef_[0], X_train.columns)
plt.figure(figsize=(10,12))
coefficients.sort_values().plot.barh(color='grey');
其中 returns 特征及其系数的条形图。
我试图做的是能够看到单个观察的不同输入值如何影响其预测。这个想法是能够 运行 对样本人群进行预测并检查具有“低”预测的组......例如,如果我 运行 预测 10 个观察值,我想看看如何不同的输入值分别影响这 10 个预测中的每一个。
记得我可以通过 Shap Values 使用以下内容实现此目的(但使用 LinearExplainer
而不是 TreeExplainer
):
# Instantiate model and encoder outside of pipeline for
# use with shap
model = RandomForestClassifier( random_state=25)
# Fit on train, score on val
model.fit(X_train_encoded, y_train2)
y_pred_shap = model.predict(X_val_encoded)
# Get an individual observation to explain.
row = X_test_encoded.iloc[[-3]]
# Why did the model predict this?
# Look at a Shapley Values Force Plot
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(row)
shap.initjs()
shap.force_plot(
base_value=explainer.expected_value[1],
shap_values=shap_values[1],
features=row
)```