如何计算 Multiclass 分类中每个 class 的 F1 度量？

Question

我正在使用 SciKit，作为一个库来处理 class化算法，例如：NB、SVM。

这里有一个非常好的 binary classification implementation 用于“SPAM 和 HAM” Emails:

    confusion += confusion_matrix(test_y, predictions)
    score = f1_score(test_y, predictions, pos_label=SPAM)
   //note in my case 3-classes I do not need to set [pos_label]

如果我有三个类像 {SPAM, HAM, NORMAL} 而不是两个，那么：我如何调整该代码以找到 F1-Score每个 class 以及所有 class 都是 平均值 。

Answer 1

这里的问题是 F1 度量 恕我直言，对于 multi-class 问题没有真正意义。它是精度和召回率之间的调和平均值。

精度是（随机选择的）正分类实例为正的概率。

Recall 是（随机选择的）正例被分类为正例的概率。

这些定义本质上是二进制的。通常我会分别为每个类给出 F1 度量。这使您还可以决定哪些类型的故障对您来说是可以接受的。根据我的个人经验，我实际上会给出精确度和召回率。在您的示例中，将非正常电子邮件分类为垃圾邮件将极其有害。因此，垃圾邮件的准确率比召回率更重要。

如需更广泛的概述以及包含措施列表的信息，您还可以查看 http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf

Answer 2

使用 sklearn 中的分类报告为多个类计算 F-score。

from sklearn.metrics import classification_report as cr
gold = []
pred = []
# given a test set with annotated gold labels
for testinstance, goldlabel in testdata:
    gold.append(goldlabel)
    #clf is your classifier object with predict method
    predictedlabel = clf.predict(testinstance)
    pred.append(predictedlabel)
print cr(gold,pred, digits=4)

如何计算 Multiclass 分类中每个 class 的 F1 度量？

How can I compute F1 measure for each class, in Multiclass Classification?

computer-science

machine-learning

nltk

text-classification