为什么在使用 scikit-learn 进行分类时，预测和得分 return 会产生不同的结果？

Question

我基于鸢尾花数据集编写了一个非常简单的多类分类器。这是代码：

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC, SVC
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import classification_report

# Load the data
iris = load_iris()
X = iris.data
y = iris.target

# Use label_binarize to be multi-label like settings
Y = label_binarize(y, classes=[0, 1, 2])
n_classes = Y.shape[1]

# Add noisy features
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)
from sklearn.preprocessing import label_binarize

# Split into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.5, random_state=0 
)

# Create classifier
classifier = OneVsRestClassifier(
    make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
)

# Train the model
classifier.fit(X_train, y_train)

我的目标是通过两种方式预测测试集的值：

使用classifier.predict()函数并定义y_pred.
使用 classifier.decision_function() 获取分数，然后为每个实例选择最高的分数并定义 y_pred_。

我是这样做的：

# Get the scores for the Test set
y_score = classifier.decision_function(X_test)

# Make predictions
y_pred  = classifier.predict(X_test)
y_pred_ = label_binarize(np.argmax(y_score, axis=1), [0,1,2])

但是看起来，当我尝试计算分类报告时，我得到的结果略有不同，但我希望结果是一样的，因为预测是基于从决策函数中获得的分数，如图所示在 documentation (line 789)。以下是两份报告：

print(classification_report(y_test, y_pred))
print(classification_report(y_test, y_pred_))

              precision    recall  f1-score   support

           0       0.54      0.62      0.58        21
           1       0.44      0.40      0.42        30
           2       0.36      0.50      0.42        24

   micro avg       0.44      0.49      0.47        75
   macro avg       0.45      0.51      0.47        75
weighted avg       0.45      0.49      0.46        75
 samples avg       0.39      0.49      0.42        75

              precision    recall  f1-score   support

           0       0.42      0.38      0.40        21
           1       0.52      0.47      0.49        30
           2       0.38      0.46      0.42        24

   micro avg       0.44      0.44      0.44        75
   macro avg       0.44      0.44      0.44        75
weighted avg       0.45      0.44      0.44        75
 samples avg       0.44      0.44      0.44        75

我做错了什么？您能否提出一个聪明而优雅的解决方案，使两个报告完全相同？

Answer 1

OneVsRestClassifier 假设您期望多标签结果，即单个输入可能有多个正标签。因此，结果不同于使用 argmax 和 decision_function。

尝试

print(y_pred[0])
print(y_pred_[0])

输出：

[0 1 1]
[0 0 1]

Answer 2

对于多标签class化，你应该使用

y_pred_ = np.where(classifier.decision_function(X_test) > 0, 1, 0)

复制 predict() 方法的输出，因为在这种情况下，不同的 classes 不是相互排斥的，即一个给定的样本可以属于多个 classes。

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, label_binarize
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import classification_report

# Load the data
iris = load_iris()
X = iris.data
y = label_binarize(iris.target, classes=[0, 1, 2])

# Split the data into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0
)

# Create classifier
classifier = OneVsRestClassifier(
    make_pipeline(StandardScaler(), LinearSVC(random_state=0))
)

# Train the model
classifier.fit(X_train, y_train)

# Make predictions
y_pred  = classifier.predict(X_test)
y_pred_ = np.where(classifier.decision_function(X_test) > 0, 1, 0)

print(classification_report(y_test, y_pred))
#               precision    recall  f1-score   support
#            0       1.00      1.00      1.00        21
#            1       0.58      0.37      0.45        30
#            2       0.95      0.83      0.89        24
#    micro avg       0.85      0.69      0.76        75
#    macro avg       0.84      0.73      0.78        75
# weighted avg       0.82      0.69      0.74        75
#  samples avg       0.66      0.69      0.67        75

print(classification_report(y_test, y_pred_))
#               precision    recall  f1-score   support
#            0       1.00      1.00      1.00        21
#            1       0.58      0.37      0.45        30
#            2       0.95      0.83      0.89        24
#    micro avg       0.85      0.69      0.76        75
#    macro avg       0.84      0.73      0.78        75
# weighted avg       0.82      0.69      0.74        75
#  samples avg       0.66      0.69      0.67        75

对于多class class化，您可以改用

y_pred_ = np.argmax(classifier.decision_function(X_test), axis=1)

在您的代码中，在这种情况下，不同的 class 是互斥的，即每个样本仅分配给一个 class。

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import classification_report

# Load the data
iris = load_iris()
X = iris.data
y = iris.target

# Split into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0
)

# Create classifier
classifier = OneVsRestClassifier(
    make_pipeline(StandardScaler(), LinearSVC(random_state=0))
)

# Train the model
classifier.fit(X_train, y_train)

# Make predictions
y_pred  = classifier.predict(X_test)
y_pred_ = np.argmax(classifier.decision_function(X_test), axis=1)

print(classification_report(y_test, y_pred))
#               precision    recall  f1-score   support
#            0       1.00      1.00      1.00        21
#            1       0.85      0.73      0.79        30
#            2       0.71      0.83      0.77        24
#     accuracy                           0.84        75
#    macro avg       0.85      0.86      0.85        75
# weighted avg       0.85      0.84      0.84        75

print(classification_report(y_test, y_pred_))
#               precision    recall  f1-score   support
#            0       1.00      1.00      1.00        21
#            1       0.85      0.73      0.79        30
#            2       0.71      0.83      0.77        24
#     accuracy                           0.84        75
#    macro avg       0.85      0.86      0.85        75
# weighted avg       0.85      0.84      0.84        75

为什么在使用 scikit-learn 进行分类时，预测和得分 return 会产生不同的结果？

Why do predictions and scores return different results in classification using scikit-learn?

python

classification

confusion-matrix

scikit-learn

multiclass-classification