为什么 precision_recall_curve() return 与混淆矩阵的值不同？

Question

我编写了以下代码来计算多类分类问题的精度和召回率：

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle

from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc, precision_recall_curve
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import roc_auc_score

def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return idx

# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(
    svm.SVC(kernel="linear", probability=True, random_state=random_state)
)
classifier.fit(X_train, y_train)
y_score = classifier.decision_function(X_test)

# Confusion matrix
from sklearn.metrics import classification_report
y_test_pred =  classifier.predict(X_test)
print(classification_report(y_test, y_test_pred))

# Compute ROC curve and ROC area for each class
precision = dict()
recall = dict()
threshold = dict()
for i in range(n_classes):
    c = classifier.classes_[i]
    precision[c], recall[c], threshold[c] = precision_recall_curve(y_test[:, c], y_score[:, c])
    th0 = find_nearest(threshold[c], 0)
    print(c, round(precision[c][th0],2), round(recall[c][th0], 2))

我想做的是重新计算混淆矩阵显示的精度和召回率

precision    recall  f1-score   support

           0       0.73      0.52      0.61        21
           1       1.00      0.07      0.12        30
           2       0.57      0.33      0.42        24

   micro avg       0.68      0.28      0.40        75
   macro avg       0.77      0.31      0.39        75
weighted avg       0.79      0.28      0.36        75
 samples avg       0.28      0.28      0.28        75

通过使用 precision_recall_curve() 函数。理论上它应该 return 与阈值等于 0 时的混淆矩阵完全相同的结果。但是我的结果与最终结果不符：

  precsion recall
0     0.75   0.57
1      1.0    0.1
2      0.6   0.38

您能否解释这种差异以及如何正确计算混淆矩阵报告的值？

Answer 1

正如我在评论中所写，考虑索引 th0 + 1 而不是索引 th0 将解决您的问题。然而，这可能只是一个案例（因为在这个特定的例子中，接近 0 的阈值总是对应负分）；因此，对于编程方法，imo 您应该将 find_nearest 修改为 return threshold 为正且最接近 0 的索引。实际上，您可以通过添加

print(th0, threshold[c][th0-1], threshold[c][th0], threshold[c][th0+1])

您将得到以下输出：

20 -0.011161920989200713 -0.01053513227868108 0.016453546101096173
67 -0.04226738229343663 -0.0074193008862454835 0.09194626401603534
38 -0.011860865951094923 -0.003756310149749531 0.0076752136658660985

对于更程序化的方法，您可以按如下方式天真地修改 find_nearest 并在循环中保留索引 th0。

def find_nearest_new(array, value):
    array = np.asarray(array)
    idx = (np.abs(np.where(array > 0, array, 999) - value)).argmin()
    return idx
...
for i in range(n_classes):
    c = classifier.classes_[i]
    precision[c], recall[c], threshold[c] = precision_recall_curve(y_test[:, c], y_score[:, c])
    th0 = find_nearest_new(threshold[c], 0)
    print(c, round(precision[c][th0],6), round(recall[c][th0], 6), round(threshold[c][th0],6))

我的线索如下，即在 precision_recall_curve 内实现精度和召回率定义如下：

precision: ndarray of shape (n_thresholds + 1,) Precision values such that element i is the precision of predictions with score >= thresholds[i] and the last element is 1.

recall: ndarray of shape (n_thresholds + 1,) Decreasing recall values such that element i is the recall of predictions with score >= thresholds[i] and the last element is 0.

换句话说，如果您按降序对分数进行排序（根据实现），您会看到所选阈值（无论您是否考虑索引 th0 + 1）与每个 class（实际上，阈值就是不同的分值）。另一方面，如果您坚持使用索引 th0（在这个特定示例中），您将获得严格小于 threshold=0.

的分数

for i in range(n_classes):
    c = classifier.classes_[i]
    precision[c], recall[c], threshold[c] = precision_recall_curve(y_test[:, c], y_score[:, c])
    th0 = find_nearest(threshold[c], 0)
    print(c, round(precision[c][th0+1],6), round(recall[c][th0+1], 6), round(threshold[c][th0+1],6))
    #print(c, precision[c], recall[c], threshold[c])
    print(np.sort(y_score[:,c])[::-1])

This post 可能有助于了解 precision_recall_curve().

中事物的运作方式

为什么 precision_recall_curve() return 与混淆矩阵的值不同？

Why does precision_recall_curve() return different values than confusion matrix?

python

classification

confusion-matrix

scikit-learn

precision-recall