使用混淆矩阵和 Statistics with Caret 计算灵敏度和特异性的零 R 模型

Zero-R model calculation of Sensitivity and Specificity using Confusion Matrix and Statistics with Caret

这是我在 R 中的 confusionMatrix() 函数的结果,它基于零 R 模型。我可能错误地设置了这个函数,根据它的结果,我手动得到的结果与随机种子不同的答案与 confusionMatrix() 函数的敏感度答案只是 1.0000 之间不匹配:

> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985

有一条警告消息,但它看起来确实仍在运行并重构数据以匹配,因为它的顺序不同,这可能基于训练和测试顺序以及随机化。我试图返回并确保火车和测试没有负号或不同行数的反向排序。这是插入符号的 confusionMatrix() 函数的结果:

> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B") 
Confusion Matrix and Statistics

          Reference
Prediction   B   M
         B 211 130
         M   0   0
                                          
               Accuracy : 0.6188          
                 95% CI : (0.5649, 0.6706)
    No Information Rate : 0.6188          
    P-Value [Acc > NIR] : 0.524           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.6188          
         Neg Pred Value :    NaN          
             Prevalence : 0.6188          
         Detection Rate : 0.6188          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : B               
                                          
Warning message:
In confusionMatrix.default(as.factor(testDiagnosisPred), as.factor(testDiagnosis),  :
  Levels are not in the same order for reference and data. Refactoring data to match.

testDiagnosisPred 只是表明它猜测良性 (B) 作为数据集中每个癌症测试的诊断,这些因种子而异,因为实际的良性 (B) 和恶性 (M) 结果每次都是随机的。

testDiagnosisPred
  B 
341 
> ## testDiagnosisPred
> ##   B 
> ## 228
> 
> majorityClass # confusion matrix

  B   M 
211 130 
> ## 
> ##   B   M 
> ## 213 128
> 
> # another seed's confusion matrix
> ## B   M 
> ## 211 130 

下面是一些使用 head() 和 str() 函数的数据:

> head(testDiagnosisPred)
[1] "B" "B" "B" "B" "B" "B"
> head(cancerdata.train$Diagnosis)
[1] "B" "B" "M" "M" "M" "B"
> head(testDiagnosis)
[1] "B" "B" "M" "M" "M" "B"
> 
> str(testDiagnosisPred)
 chr [1:341] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" ...
> str(cancerdata.train$Diagnosis)
 chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
> str(testDiagnosis)
 chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
> 

与混淆矩阵的混淆以及特异性和敏感性的计算是因为误读了水平而不是垂直的混淆矩阵,正确答案来自插入符号中的 confusionMatrix() 函数,另一种了解方式是这是一个 ZeroR 模型,经过进一步调查,它始终是 1.00 的灵敏度和 0.00 的特异性!那是因为ZeroR模型使用了零规则和零属性,只是给出了多数预测。

> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B") 
Confusion Matrix and Statistics

          Reference
Prediction   B   M
         B 211 130
         M   0   0
                                          
               Accuracy : 0.6188                  
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000 

当我进行这些手动特异性和敏感性计算时,我误读了水平而不是垂直的混淆矩阵:

> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985