查看单个预测的模型系数

Viewing model coefficients for a single prediction

我使用以下方法在 scikit-learn 管道中建立了一个逻辑回归模型:

pipeline = make_pipeline(
    StandardScaler(),
    LogisticRegressionCV(
        solver='lbfgs',
        cv=10,
        scoring='roc_auc',
        class_weight='balanced'
    )
)

pipeline.fit(X_train, y_train)

y_pred = pipeline.predict(X_test)

我可以使用此代码查看模型的整体预测系数...

# Look at model's coefficients to see what features are most important
plt.rcParams['figure.dpi'] = 50
model = pipeline.named_steps['logisticregressioncv']
coefficients = pd.Series(model.coef_[0], X_train.columns)
plt.figure(figsize=(10,12))
coefficients.sort_values().plot.barh(color='grey');

其中 returns 特征及其系数的条形图。

试图做的是能够看到单个观察的不同输入值如何影响其预测。这个想法是能够 运行 对样本人群进行预测并检查具有“低”预测的组......例如,如果我 运行 预测 10 个观察值,我想看看如何不同的输入值分别影响这 10 个预测中的每一个。

记得我可以通过 Shap Values 使用以下内容实现此目的(但使用 LinearExplainer 而不是 TreeExplainer):

# Instantiate model and encoder outside of pipeline for 
# use with shap
model = RandomForestClassifier( random_state=25)
# Fit on train, score on val
model.fit(X_train_encoded, y_train2)
y_pred_shap = model.predict(X_val_encoded)
# Get an individual observation to explain.
row = X_test_encoded.iloc[[-3]]
# Why did the model predict this?
# Look at a Shapley Values Force Plot
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(row)
shap.initjs()
shap.force_plot(
    base_value=explainer.expected_value[1],
    shap_values=shap_values[1],
    features=row
)```