class 标签不符合预期时的模型性能计算

Model performance calculation when class labels are not as expected

您好,我正在训练神经网络。训练数据集的标签为良性或恶性。所以我觊觎它成数值使用,

class_data= pd.factorize(class_data)[0]

所以现在恶性肿瘤已经给出-0(癌变的) 良性 - 1(非癌性)

现在混淆矩阵如下所示

我需要计算灵敏度、特异性。计算如下

tn, fp, fn, tp = confusion_matrix(test_y,y_pred).ravel()

# Accuracy : 
acc_ = (tp + tn) / (tp + tn + fn + fp)
print("Accuracy  : ", acc_)
# Sensitivity : 
sens_ = tp / (tp + fn)
print("Sensitivity  : ", sens_)
# Specificity 
sp_ = tn / (tn + fp)
print("Specificity  : ", sp_)
# False positive rate (FPR)
FPR = fp / (tn + fp)
print("False positive rate  : ", FPR)

由于我的 class 标签被错误标记,有人可以让我知道计算被误解了吗? PS:

...tn... 29
...fp... 15
...fn... 14
...tp... 85

为了安全起见,您可以只显式计算每个单元格 使用 sklearn.metrics.confusion_matrix(),然后从那里继续:

# some fake data (assumes labels are boolean)
test_y = [True, True, False, False, True]
y_pred = [True, False, True, False, True]

idx_range = range(len(test_y))

tn = sum([not test_y[idx] and not y_pred[idx] for idx in idx_range])
fp = sum([not test_y[idx] and y_pred[idx] for idx in idx_range])
fn = sum([test_y[idx] and not y_pred[idx] for idx in idx_range])
tp = sum([test_y[idx] and y_pred[idx] for idx in idx_range])

# ... and then calculate the metrics 

如果您更喜欢使用 pandas.factorize(),您还可以通过设置 [=18] 强制将 True 映射到 1 并将 False 映射到 0 =]:

test_y = [True, True, False, False, True]
y_pred = [True, False, True, False, True]

# pd.factorize() returns a tuple so get the data (0th elem)
test_y_factor = pd.factorize(test_y, sort=True)[0]
y_pred_factor = pd.factorize(y_pred, sort=True)[0]

# confirm that the translation happened properly:
[*zip(test_y, test_y_factor[0])]
## 
## output: 
## [(True, 1), (True, 1), (False, 0), (False, 0), (True, 1)]

为确保您的计算正确,您可以手动查找 F1 分数,如

F1Score= 2tp/(2tp+fp+fn)

然后将您的值与

进行比较
sklearn.metrics.f1_score(test_y, y_pred)

您还可以使用标签参数来确保标签正确。

confusion_matrix(test_y,y_pred,labels=[0,1]).ravel()