R 中 coords() 和 confusionMatrix() 在最佳切点处报告的灵敏度和特异性不匹配

Question

我在 R 中训练了一个具有线性核的 SVM 来对患有疾病的患者进行分类，使用 predict() 函数使用 SVM 模型在测试集上生成预测概率，然后使用 roc() 生成 ROC 曲线来自 pROC 库的函数。我还使用 coords() 来使用 Youden 指数计算最佳切点。 coords() 返回的切点为 0.8489392，特异性为 0.6250000，灵敏度为 0.7954545。

当我尝试使用在此分界点所做的预测生成混淆矩阵时，我得到的灵敏度为 0.20455，特异性为 0.37500，但无法弄清楚为什么它们与坐标报告的灵敏度和特异性不匹配（ ).

这是几个模型中唯一一个两个函数报告的灵敏度和特异性不匹配的模型。

代码如下：

svm_linear <- train(ercp_chole ~ stone_any_modality + age + peak_pre_bili + max_cbd_dia_any,
    data = chole_training,
    method = "svmLinear",
    trControl = trainControl(method = "repeatedcv", number = 10, repeats = 3, classProbs=TRUE, summaryFunction=twoClassSummary),
    na.action = na.exclude,
    preProcess = c("center", "scale"),
    metric = "ROC",
    tuneLength = 10
)

pprob_svm_linear <- predict(svm_linear, chole_testing, type="prob")
svm_linear_roc <- roc(chole_testing$ercp_chole, pprob_svm_linear[,2], auc=TRUE)
coords(svm_linear_roc, "best", "threshold", transpose=TRUE, best.method="youden")

confusionMatrix(factor( ifelse(pprob_svm_linear[, "chole_pos"] > 0.8489392, "chole_pos", "chole_neg") ), chole_testing$ercp_chole, positive="chole_pos")

调用 roc() 的结果：

Setting levels: control = chole_neg, case = chole_pos
Setting direction: controls > cases

调用 coords() 的结果：

threshold specificity sensitivity
0.8489392   0.6250000   0.7954545

调用 confusionMatrix() 的结果：

Confusion Matrix and Statistics

           Reference
Prediction  chole_neg chole_pos
  chole_neg         3        35
  chole_pos         5         9

               Accuracy : 0.2308
                 95% CI : (0.1253, 0.3684)
    No Information Rate : 0.8462
    P-Value [Acc > NIR] : 1

                  Kappa : -0.1659

 Mcnemar's Test P-Value : 4.533e-06

            Sensitivity : 0.20455
            Specificity : 0.37500
         Pos Pred Value : 0.64286
         Neg Pred Value : 0.07895
             Prevalence : 0.84615
         Detection Rate : 0.17308
   Detection Prevalence : 0.26923
      Balanced Accuracy : 0.28977

       'Positive' Class : chole_pos

如有任何帮助，我们将不胜感激！

提前致谢。

Answer 1

如中所述，pROC 可以自动检测对照（阴性）观察值是否高于或低于案例（阳性）。正如您提到的，您可以在 roc 函数的输出中看到这一点：

Setting levels: control = chole_neg, case = chole_pos
Setting direction: controls > cases

相反，confusionMatrix 无法做到这一点，并且会始终假设正观察值具有更高的值。结果，ROC曲线“反转”，has an AUC < 0.5.

明确设置级别（按负、正顺序）和方向是个好主意。为此，您需要查看数据并了解控件或案例是否具有更高的值，以及这是否有意义。然后您可以使用这些值显式调用 roc 函数。例如，如果在您的情况下负数确实具有更高的值：

svm_linear_roc <- roc(chole_testing$ercp_chole,
                      pprob_svm_linear[,2],
                      auc=TRUE,
                      levels = c("chole_neg", "chole_pos"),
                      direction = ">")

R 中 coords() 和 confusionMatrix() 在最佳切点处报告的灵敏度和特异性不匹配

Sensitivity and specificity reported by coords() and confusionMatrix() in R at optimal cut-point don't match

r

svm

proc

roc

r-caret