获取 classification 报告,说明使用 10 折交叉验证的多项式朴素贝叶斯的 class 明智精度和召回率
Get a classification report stating the class wise precision and recall for multinomial Naive Bayes using 10 fold cross validation
我有以下代码,它使用 NB classifier 来解决多 class classification 问题。该函数通过存储精度并稍后打印平均值来执行交叉验证。相反,我想要的是一份 class 化报告,其中指定 class 明智的精确度和召回率,而不是最终的平均准确度分数。
import random
from sklearn import cross_validation
from sklearn.naive_bayes import MultinomialNB
def multinomial_nb_with_cv(x_train, y_train):
random.shuffle(X)
kf = cross_validation.KFold(len(X), n_folds=10)
acc = []
for train_index, test_index in kf:
y_true = y_train[test_index]
clf = MultinomialNB().fit(x_train[train_index],
y_train[train_index])
y_pred = clf.predict(x_train[test_index])
acc.append(accuracy_score(y_true, y_pred))
如果我不执行交叉验证,我所要做的就是:
from sklearn.metrics import classification_report
from sklearn.naive_bayes import MultinomialNB
def multinomial_nb(x_train, y_train, x_test, y_test):
clf = MultinomialNB().fit(x_train, y_train)
y_pred = clf.predict(x_test)
y_true = y_test
print classification_report(y_true, y_pred)
它给了我这样的报告:
precision recall f1-score support
0 0.50 0.24 0.33 221
1 0.00 0.00 0.00 18
2 0.00 0.00 0.00 27
3 0.00 0.00 0.00 28
4 0.00 0.00 0.00 32
5 0.04 0.02 0.02 57
6 0.00 0.00 0.00 26
7 0.00 0.00 0.00 25
8 0.00 0.00 0.00 43
9 0.00 0.00 0.00 99
10 0.63 0.98 0.76 716
avg / total 0.44 0.59 0.48 1292
如何在交叉验证的情况下得到类似的报告?
您可以使用 cross_val_predict
生成交叉验证预测,然后使用 classification_report
。
from sklearn.datasets import make_classification
from sklearn.cross_validation import cross_val_predict
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
# generate some artificial data with 11 classes
X, y = make_classification(n_samples=2000, n_features=20, n_informative=10, n_classes=11, random_state=0)
# your classifier, assume GaussianNB here for non-integer data X
estimator = GaussianNB()
# generate your cross-validation prediction with 10 fold Stratified sampling
y_pred = cross_val_predict(estimator, X, y, cv=10)
y_pred.shape
Out[91]: (2000,)
# generate report
print(classification_report(y, y_pred))
precision recall f1-score support
0 0.47 0.36 0.41 181
1 0.38 0.46 0.41 181
2 0.45 0.53 0.48 182
3 0.29 0.45 0.35 183
4 0.37 0.33 0.35 183
5 0.40 0.44 0.42 182
6 0.27 0.13 0.17 183
7 0.47 0.44 0.45 182
8 0.34 0.27 0.30 182
9 0.41 0.44 0.42 179
10 0.42 0.41 0.41 182
avg / total 0.39 0.39 0.38 2000
我有以下代码,它使用 NB classifier 来解决多 class classification 问题。该函数通过存储精度并稍后打印平均值来执行交叉验证。相反,我想要的是一份 class 化报告,其中指定 class 明智的精确度和召回率,而不是最终的平均准确度分数。
import random
from sklearn import cross_validation
from sklearn.naive_bayes import MultinomialNB
def multinomial_nb_with_cv(x_train, y_train):
random.shuffle(X)
kf = cross_validation.KFold(len(X), n_folds=10)
acc = []
for train_index, test_index in kf:
y_true = y_train[test_index]
clf = MultinomialNB().fit(x_train[train_index],
y_train[train_index])
y_pred = clf.predict(x_train[test_index])
acc.append(accuracy_score(y_true, y_pred))
如果我不执行交叉验证,我所要做的就是:
from sklearn.metrics import classification_report
from sklearn.naive_bayes import MultinomialNB
def multinomial_nb(x_train, y_train, x_test, y_test):
clf = MultinomialNB().fit(x_train, y_train)
y_pred = clf.predict(x_test)
y_true = y_test
print classification_report(y_true, y_pred)
它给了我这样的报告:
precision recall f1-score support
0 0.50 0.24 0.33 221
1 0.00 0.00 0.00 18
2 0.00 0.00 0.00 27
3 0.00 0.00 0.00 28
4 0.00 0.00 0.00 32
5 0.04 0.02 0.02 57
6 0.00 0.00 0.00 26
7 0.00 0.00 0.00 25
8 0.00 0.00 0.00 43
9 0.00 0.00 0.00 99
10 0.63 0.98 0.76 716
avg / total 0.44 0.59 0.48 1292
如何在交叉验证的情况下得到类似的报告?
您可以使用 cross_val_predict
生成交叉验证预测,然后使用 classification_report
。
from sklearn.datasets import make_classification
from sklearn.cross_validation import cross_val_predict
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
# generate some artificial data with 11 classes
X, y = make_classification(n_samples=2000, n_features=20, n_informative=10, n_classes=11, random_state=0)
# your classifier, assume GaussianNB here for non-integer data X
estimator = GaussianNB()
# generate your cross-validation prediction with 10 fold Stratified sampling
y_pred = cross_val_predict(estimator, X, y, cv=10)
y_pred.shape
Out[91]: (2000,)
# generate report
print(classification_report(y, y_pred))
precision recall f1-score support
0 0.47 0.36 0.41 181
1 0.38 0.46 0.41 181
2 0.45 0.53 0.48 182
3 0.29 0.45 0.35 183
4 0.37 0.33 0.35 183
5 0.40 0.44 0.42 182
6 0.27 0.13 0.17 183
7 0.47 0.44 0.45 182
8 0.34 0.27 0.30 182
9 0.41 0.44 0.42 179
10 0.42 0.41 0.41 182
avg / total 0.39 0.39 0.38 2000