获取 classification 报告,说明使用 10 折交叉验证的多项式朴素贝叶斯的 class 明智精度和召回率

Get a classification report stating the class wise precision and recall for multinomial Naive Bayes using 10 fold cross validation

我有以下代码,它使用 NB classifier 来解决多 class classification 问题。该函数通过存储精度并稍后打印平均值来执行交叉验证。相反,我想要的是一份 class 化报告,其中指定 class 明智的精确度和召回率,而不是最终的平均准确度分数。

   import random
   from sklearn import cross_validation
   from sklearn.naive_bayes import MultinomialNB

   def multinomial_nb_with_cv(x_train, y_train):
        random.shuffle(X)
        kf = cross_validation.KFold(len(X), n_folds=10)
        acc = []
        for train_index, test_index in kf:
            y_true = y_train[test_index]
            clf = MultinomialNB().fit(x_train[train_index],         
            y_train[train_index])
            y_pred = clf.predict(x_train[test_index])
            acc.append(accuracy_score(y_true, y_pred))

如果我不执行交叉验证,我所要做的就是:

    from sklearn.metrics import classification_report
    from sklearn.naive_bayes import MultinomialNB

    def multinomial_nb(x_train, y_train, x_test, y_test):
        clf = MultinomialNB().fit(x_train, y_train)
        y_pred = clf.predict(x_test)
        y_true = y_test
        print classification_report(y_true, y_pred)

它给了我这样的报告:

        precision    recall  f1-score   support

      0       0.50      0.24      0.33       221
      1       0.00      0.00      0.00        18
      2       0.00      0.00      0.00        27
      3       0.00      0.00      0.00        28
      4       0.00      0.00      0.00        32
      5       0.04      0.02      0.02        57
      6       0.00      0.00      0.00        26
      7       0.00      0.00      0.00        25
      8       0.00      0.00      0.00        43
      9       0.00      0.00      0.00        99
     10       0.63      0.98      0.76       716

    avg / total       0.44      0.59      0.48      1292

如何在交叉验证的情况下得到类似的报告?

您可以使用 cross_val_predict 生成交叉验证预测,然后使用 classification_report

from sklearn.datasets import make_classification
from sklearn.cross_validation import cross_val_predict
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# generate some artificial data with 11 classes
X, y = make_classification(n_samples=2000, n_features=20, n_informative=10, n_classes=11, random_state=0)

# your classifier, assume GaussianNB here for non-integer data X
estimator = GaussianNB()
# generate your cross-validation prediction with 10 fold Stratified sampling
y_pred = cross_val_predict(estimator, X, y, cv=10)
y_pred.shape

Out[91]: (2000,)

# generate report
print(classification_report(y, y_pred))

             precision    recall  f1-score   support

          0       0.47      0.36      0.41       181
          1       0.38      0.46      0.41       181
          2       0.45      0.53      0.48       182
          3       0.29      0.45      0.35       183
          4       0.37      0.33      0.35       183
          5       0.40      0.44      0.42       182
          6       0.27      0.13      0.17       183
          7       0.47      0.44      0.45       182
          8       0.34      0.27      0.30       182
          9       0.41      0.44      0.42       179
         10       0.42      0.41      0.41       182

avg / total       0.39      0.39      0.38      2000