如何在 R 中执行引导至诊断测试？

Question

我正在分析两项诊断测试的准确性，这些测试根据基本事实（黄金标准）检测一个人是否患有疾病（“1”）或没有疾病（“0”）。我想用 95% 置信区间计算灵敏度、特异性和似然比。但是，我需要通过引导程序进行分析，但到目前为止我还无法做到。非常感谢带有以下数据集的示例或有关使用哪个包和编码哪些功能的建议。

这是我的数据库的一个片段

Link for Database

我已经使用 acc.1 测试计算了 CI 的灵敏度、特异性和似然比。但是，我需要对此应用引导程序。

非常感谢！

Answer 1

这是 bootstrap 使用 R 包 rsample. There are many other possibilities like boot, bootstrap 或手动编码的示例，但我喜欢这种方法，因为它非常灵活。首先我加载包

library(dplyr)
library(purrr)
library(rsample)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Set the global option `yardstick.event_first` to `FALSE` to change this.

该警告意味着我们需要设置 options(yardstick.event_first = FALSE)，因为 gold_std、test1 和 test2 的第一个因子水平为 0。

options(yardstick.event_first = FALSE)

然后我加载数据

data <- tibble(
  ID = 1:25, 
  gold_std = factor(c(0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0)), 
  test_1 = factor(c(0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0)), 
  test_2 = factor(c(0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0))
)

混淆矩阵为：

table(select(data, gold_std, test_1))
#>         test_1
#> gold_std 0 1
#>        0 8 7
#>        1 1 9

这意味着灵敏度等于 9 / (9 + 1)，特异性等于 8 / (7 + 8)。确实

sens(data, truth = gold_std, estimate = test_1)
#> # A tibble: 1 x 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 sens    binary           0.9

和

spec(data, truth = gold_std, estimate = test_1)
#> # A tibble: 1 x 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 spec    binary         0.533

函数 sens e spec 在 R 包 yardstick 中定义。我可以使用 R 包 rsample 创建 30 bootstrap 个数据副本：

set.seed(1)
bootstrap_data <- bootstraps(data, times = 30)

这是结果

bootstrap_data
#> # Bootstrap sampling 
#> # A tibble: 30 x 2
#>    splits          id         
#>    <list>          <chr>      
#>  1 <split [25/9]>  Bootstrap01
#>  2 <split [25/13]> Bootstrap02
#>  3 <split [25/11]> Bootstrap03
#>  4 <split [25/7]>  Bootstrap04
#>  5 <split [25/10]> Bootstrap05
#>  6 <split [25/10]> Bootstrap06
#>  7 <split [25/8]>  Bootstrap07
#>  8 <split [25/7]>  Bootstrap08
#>  9 <split [25/9]>  Bootstrap09
#> 10 <split [25/7]>  Bootstrap10
#> # ... with 20 more rows

bootstrap_data 是一个新的数据框，有 30 行（与 boostrap 复制的数量一样多）和 2 列：splits e id。 splits 列记录从每个 bootstrap 复制中排除的观察数，而 id 只是简单的 id 列。例如，在 bootstrap 样本的第一个复制中，排除了 9 个观察值。此外，我们可以使用函数 analysis 来检索在每个 bootrstrap 复制中采样的观察结果。这是存储 bootstrap 采样索引的更有效方法。

现在我可以定义两个函数，它们将一个分割对象作为输入，return 该 boostrap 样本的灵敏度和特异性估计值。

bootstrap_sens <- function(splits) {
  x <- analysis(splits)
  sens(x, truth = gold_std, estimate = test_1)$.estimate
}

bootstrap_spec <- function(splits) {
  x <- analysis(splits)
  spec(x, truth = gold_std, estimate = test_1)$.estimate
}

我使用符号 $.estimate 因为我只需要估计灵敏度和特异性（你可以从开头的输出中看到它们 return 一个数据框）。我可以使用 map_dbl 将这些函数应用于 bootstrap 复制（因为这两个函数的输出都是双精度类型的对象），结果如下：

bootstrap_data <- bootstrap_data %>% 
  mutate(
    sens = map_dbl(splits, bootstrap_sens), 
    spec = map_dbl(splits, bootstrap_spec)
  )
bootstrap_data
#> # Bootstrap sampling 
#> # A tibble: 30 x 4
#>    splits          id           sens  spec
#>  * <list>          <chr>       <dbl> <dbl>
#>  1 <split [25/9]>  Bootstrap01 0.727 0.571
#>  2 <split [25/13]> Bootstrap02 1     0.471
#>  3 <split [25/11]> Bootstrap03 1     0.312
#>  4 <split [25/7]>  Bootstrap04 0.9   0.533
#>  5 <split [25/10]> Bootstrap05 0.8   0.533
#>  6 <split [25/10]> Bootstrap06 1     0.364
#>  7 <split [25/8]>  Bootstrap07 0.9   0.6  
#>  8 <split [25/7]>  Bootstrap08 1     0.533
#>  9 <split [25/9]>  Bootstrap09 0.667 0.625
#> 10 <split [25/7]>  Bootstrap10 0.667 0.75 
#> # ... with 20 more rows

^{由 reprex package (v0.3.0)}

于 2019-07-22 创建

您可以对 test_2 使用相同的方法。我不确定 likelihood ratios 是什么意思，但您可以定义自己的函数来评估它并以与 bootstrap_sens 或 bootstrap_spec 相同的方式应用它。您可以在软件包网站上找到有关 rsample 的更多详细信息。

如何在 R 中执行引导至诊断测试？

How to perform bootstrapping to a diagnostic test in R?

r

statistics-bootstrap