sklearn roc_auc_score with multi_class=="ovr" 应该有 None 平均可用

Question

我正在尝试使用 sklearn 的 roc_auc_score() 函数计算多类问题的 AUC 分数。

我有形状为 [n_samples,n_classes] 的预测矩阵和形状为 [n_samples] 的地面真值向量，名为 np_pred 和 np_label分别。

我想要实现的是一组 AUC 分数，每个类对应一个分数。

为此我想使用 average 参数选项 None 和 multi_class 参数设置为 "ovr"，但是如果我运行

roc_auc_score(y_score=np_pred, y_true=np_label, multi_class="ovr",average=None)

我回来了

ValueError: average must be one of ('macro', 'weighted') for multiclass problems

在 multiclass; but if you take a look at the roc_auc_score function source code, you can see that if the multi_class parameter is set to "ovr", and the average is one of the accepted one, the multiClass case is treated as a multiLabel one and the internal multiLabel function accepts None 作为 average 参数的情况下，sklearn 函数预计会出现此错误。

因此，通过查看代码，似乎我应该能够在 One vs Rest 情况下执行具有 None 平均值的多类，但在 if 情况下源代码不允许这样的组合。

我错了吗？

如果我错了，从理论的角度来看，我应该伪造一个多标签案例只是为了让不同的类具有不同的 AUC，还是应该编写自己的函数来循环不同的类并输出 AUCs?

谢谢

Answer 1

如您所知，现在 sklearn multiclass ROC AUC 仅处理 macro 和 weighted 平均值。但它可以实现，因为它可以单独 return 每个 class.

的分数

从理论上讲，您可以实施 OVR 并计算每个 class roc_auc_score，如：

roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
    selected_classifier.fit(train_set_dataframe, train_class == label)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
    roc[label] += roc_auc_score(test_class, predictions_proba[:,1])

Answer 2

根据sklearn documentation，multi_class的默认参数是'raised'，在文档中提到，默认参数会抛出异常，所以你必须提到ovr 或 ovo 显式 multi_class='ovr'.

参考附件截图

sklearn roc_auc_score with multi_class=="ovr" 应该有 None 平均可用

sklearn roc_auc_score with multi_class=="ovr" should have None average available

python

machine-learning

scikit-learn

auc