如何计算所有折叠的平均分类报告?
How to calculate average classification report across all folds?
我正在尝试进行二进制 class class化。由于我有一个小数据集(275 个样本),我已经完成了留一法交叉验证,并希望获得所有折叠的平均 class化报告和 AUROC/AUPRC。
我密切关注 得出了我的结果,但我无法理解最后一行代码的作用。
for i in classifiers:
print(i)
originalclass = []
predictedclass = []
model=i
loo = LeaveOneOut()
print('Scores before feature selection')
scores = cross_val_score(model, subset, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, predictedclass))
print('Scores after feature selection')
X_reduced=feature_reduction_using_RFECV(model,subset,y)
scores = cross_val_score(model, X_reduced, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, predictedclass))
上面代码中的平均发生在什么地方?我正在计算平均 CV 分数并打印出来。但那之后的那一行最让我困惑。我在开始时初始化了 originalclass 和 predictedclass 变量,但是在最后一行打印之前它在哪里使用?
print(classification_report(originalclass, predictedclass))
已编辑代码
for i in classifiers:
print(i)
originalclass = y
model=i
loo = LeaveOneOut()
print('Scores before feature selection')
y_pred = cross_val_predict(model, subset, y, cv=loo)
print(classification_report(originalclass, y_pred))
print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, y_pred))
print('Scores after feature selection')
X_reduced=feature_reduction_using_RFECV(model,subset,y)
y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, y_pred))
当你使用
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
您在 cv
即 LeaveOneOut 方案下打印模型的平均交叉验证 roc_auc
指标。
下一个命令:
print(classification_report(originalclass, predictedclass))
用于打印完整的分类报告,而不仅仅是上一行中的平均 roc_auc
指标。
此命令将以下内容作为输入参数:
classification_report(y_true, y_pred)
y_true
对你来说是 originalclass
,ground truth 和 y_pred
应该是预测的交叉验证 labels/classes.
你应该有这样的东西:
y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)
现在,y_pred
已经对标签的预测进行了交叉验证,因此分类报告将根据分类指标打印交叉验证的结果。
用于说明上述内容的玩具示例:
from sklearn.metrics import classification_report
originalclass = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
print(classification_report(originalclass, y_pred))
precision recall f1-score support
0 0.50 1.00 0.67 1
1 0.00 0.00 0.00 1
2 1.00 0.67 0.80 3
micro avg 0.60 0.60 0.60 5
macro avg 0.50 0.56 0.49 5
weighted avg 0.70 0.60 0.61 5
我正在尝试进行二进制 class class化。由于我有一个小数据集(275 个样本),我已经完成了留一法交叉验证,并希望获得所有折叠的平均 class化报告和 AUROC/AUPRC。
我密切关注
for i in classifiers:
print(i)
originalclass = []
predictedclass = []
model=i
loo = LeaveOneOut()
print('Scores before feature selection')
scores = cross_val_score(model, subset, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, predictedclass))
print('Scores after feature selection')
X_reduced=feature_reduction_using_RFECV(model,subset,y)
scores = cross_val_score(model, X_reduced, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, predictedclass))
上面代码中的平均发生在什么地方?我正在计算平均 CV 分数并打印出来。但那之后的那一行最让我困惑。我在开始时初始化了 originalclass 和 predictedclass 变量,但是在最后一行打印之前它在哪里使用?
print(classification_report(originalclass, predictedclass))
已编辑代码
for i in classifiers:
print(i)
originalclass = y
model=i
loo = LeaveOneOut()
print('Scores before feature selection')
y_pred = cross_val_predict(model, subset, y, cv=loo)
print(classification_report(originalclass, y_pred))
print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, y_pred))
print('Scores after feature selection')
X_reduced=feature_reduction_using_RFECV(model,subset,y)
y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, y_pred))
当你使用
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
您在 cv
即 LeaveOneOut 方案下打印模型的平均交叉验证 roc_auc
指标。
下一个命令:
print(classification_report(originalclass, predictedclass))
用于打印完整的分类报告,而不仅仅是上一行中的平均 roc_auc
指标。
此命令将以下内容作为输入参数:
classification_report(y_true, y_pred)
y_true
对你来说是 originalclass
,ground truth 和 y_pred
应该是预测的交叉验证 labels/classes.
你应该有这样的东西:
y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)
现在,y_pred
已经对标签的预测进行了交叉验证,因此分类报告将根据分类指标打印交叉验证的结果。
用于说明上述内容的玩具示例:
from sklearn.metrics import classification_report
originalclass = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
print(classification_report(originalclass, y_pred))
precision recall f1-score support
0 0.50 1.00 0.67 1
1 0.00 0.00 0.00 1
2 1.00 0.67 0.80 3
micro avg 0.60 0.60 0.60 5
macro avg 0.50 0.56 0.49 5
weighted avg 0.70 0.60 0.61 5