roc_auc_score y_test 和 y_score 不匹配

Question

我正在尝试计算以下内容：

auc = roc_auc_score(gt, pr, multi_class="ovr")

其中 gt 是一个大小为 3470208 的列表，包含 0 到 41 之间的值（全部为 int），pr 是一个大小为 3470208（相同大小）的列表，每个列表的大小为 42，每个位置的概率总和为 1。

但是，我收到以下错误：

ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

所以我有点迷路了，因为 y_true (gt) 中类的数量是 42，因为我有一个从 0 到 41 的整数列表。

并且由于 pr 是一个大小为 42 的列表列表，所以我认为它应该可以工作。

将不胜感激！

Answer 1

确保 gt.

中存在所有 0 到 41（含）之间的整数

一个简单的例子：

import numpy as np
from sklearn.metrics import roc_auc_score

# results in error:
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
#roc_auc_score(gt1, pr1, multi_class='ovr')


# does not result in error:
gt2 = np.array([0,2,1,3])
pr2 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3],
     [0.3, 0.3, 0.2, 0.2]] 
)
#roc_auc_score(gt2, pr2, multi_class='ovr')

因为 integer/label 2 在 gt1 中不存在，所以会抛出错误。也就是说，gt1(3)中类的个数不等于pr1(4)中的列数.

Answer 2

roc_auc_score 方法有一个 labels 参数，可用于指定缺失的标签。

不幸的是，这仅适用于 multi_class="ovo" 模式，不适用于 "ovr" 模式。

# without labels
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovo')
> ValueError: Number of classes in y_true not equal to the number of columns in 'y_score'

# with labels and multi-class="ovo":
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovo', labels=[0, 1, 2, 3])
> 0.5

# with labels and multi-class="ovr":
gt1 = np.array([0,1,3])
pr1 = np.array(
    [[0.1, 0.7, 0.1, 0.1], 
     [0.3, 0.3, 0.2, 0.2], 
     [0.5, 0.1, 0.1, 0.3]]
)
roc_auc_score(gt1, pr1, multi_class='ovr', labels=[0, 1, 2, 3])
> ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

在这种情况下，y_true 中只有一个 class，因为 roc_auc_score 函数迭代每个 class（标识为 class A）并将它们与其他 classes（标识为 class B）进行比较。对于class2，y_true数组等于[B,B,B]所以只有一个class无法计算ROC AUC分数

roc_auc_score y_test 和 y_score 不匹配

roc_auc_score mismatch between y_test and y_score

python

auc

keras