多个阈值的混淆矩阵

Question

我正在尝试（有效地）运行 sklearn.metrics.confusion_matrix 多个阈值。需要这样做，以便我可以告诉客户在任何给定的人口挑战百分比下应该期望什么样的性能。

目前，我正在一个循环中执行此操作，遍及所有可能的阈值，但这速度慢且效率低下。有什么方法可以用单线或类似的方式做到这一点？

threshold_list = (np.linspace(1, 0, 1001)).tolist()
for threshold in threshold_list:
    df.loc[df['score'] >= threshold,'prediction'] = '1'
    arr = confusion_matrix(df['true'].astype('int16').values, df['prediction'].astype('int16').values)
    ....
    ....

Answer 1

如果 TPr 和 FPr 对你来说足够了。您可以执行以下操作：

y_true=[1,0,0,1,1,0,0]
y_pred=[0.67, 0.48, 0.27, 0.52, 0.63, 0.45, 0.53]
fpr, tpr, thresholds = roc_curve(y_true, y_pred)
res = pd.DataFrame({'FPR': fpr, 'TPR': tpr, 'Threshold': thresholds})
res[['TPR', 'FPR', 'Threshold']]

输出：

    TPR         FPR Threshold
0   0.333333    0.00    0.67
1   0.666667    0.00    0.63
2   0.666667    0.25    0.53
3   1.000000    0.25    0.52
4   1.000000    1.00    0.27

多个阈值的混淆矩阵

Confusion matrix over multiple thresholds

python

pandas

confusion-matrix

scikit-learn