为什么在分类报告中显示未出现的类？

Question

我正在研究 NER 并使用 sklearn.metrics.classification_report 计算微观和宏观 f1 分数。它打印了一个 table 像：

              precision    recall  f1-score   support

           0     0.0000    0.0000    0.0000         0
           3     0.0000    0.0000    0.0000         0
           4     0.8788    0.9027    0.8906       257
           5     0.9748    0.9555    0.9650      1617
           6     0.9862    0.9888    0.9875      1156
           7     0.9339    0.9138    0.9237       835
           8     0.8542    0.7593    0.8039       216
           9     0.8945    0.8575    0.8756       702
          10     0.9428    0.9382    0.9405      1668
          11     0.9234    0.9139    0.9186      1661

    accuracy                         0.9285      8112
   macro avg     0.7388    0.7230    0.7305      8112
weighted avg     0.9419    0.9285    0.9350      8112

很明显，预测标签有'0'或'3'，但真实标签中没有'0'或'3'。为什么分类报告会显示这两个没有样本的类？以及如何防止显示“0-support”类。看来这两个类对macro f1成绩影响很大

Answer 1

您可以使用以下代码片段来确保分类报告中的所有标签都出现在 y_true 个标签中：

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1, 42]
print(classification_report(y_true, y_pred, labels=np.unique(y_true)))

哪个输出：

              precision    recall  f1-score   support

           0       0.50      1.00      0.67         1
           1       0.00      0.00      0.00         1
           2       1.00      0.50      0.67         4

   micro avg       0.60      0.50      0.55         6
   macro avg       0.50      0.50      0.44         6
weighted avg       0.75      0.50      0.56         6

如您所见，预测中的标签 42 未显示，因为它在 y_true 中不受支持。

为什么在分类报告中显示未出现的类？

Why are non-appearing classes shown in the classification report?

named-entity-recognition

scikit-learn