R中的多个卡方检验

Question

假设我有以下数据：

ID.	Drug1.	Drug2.	Drug3.	Drug4.
1.	1.	0.	0.	0.
2.	0.	0.	0.	1.
3.	0.	1.	0.	0.
4.	0.	0.	1.	0.
5.	1.	0.	0.	0.

其中 ID 是给每个患者的编号，每个药物变量是一个二元变量，其中 1 表示患者对该药物有某种情况，0 表示 he/she 没有。

为了比较药物之间的疾病发生率比例，我想进行卡方检验，例如：Drug1 vs Drug2、Drug1 vs Drug3、Drug1 vs Drug4、Drug2 vs Drug3、Drug2 vs Drug4 等.

如何在 R 中用一行代码完成此操作？顺便说一句，是否有必要对多重比较（例如 Bonferroni）进行校正？

Answer 1

下面是使用 {dplyr} 的 tidyverse 方法。我首先生成一些数据用于运行具有有意义结果的实际测试。然后我们可以用mydat的colnames和combn得到所有的药对。然后我们可以使用 rowwise 和 mutate 并将 chisq.test() 应用于每一行。这里我们使用 V1 和 V2 中的字符串对 mydat 中的变量进行子集化。由于我们在 data.frame 中，如果结果是 non-atomic 向量，我们必须将结果包装在 list 中。我们可以将 chisq_test 与 $p.value 子集以获得 p 值。

library(dplyr) 
set.seed(123)

mydat <- tibble(ID = 1:1000,
                Drug1 = round(rnorm(1000, 0.8, sd = 0.5)),
                Drug2 = round(rnorm(1000, 0.6, sd = 0.5), 0),
                Drug3 = round(rnorm(1000, 0.5, sd = 0.5), 0),
                Drug4 = round(rnorm(1000, 0.3, sd = 0.3), 0)
                ) %>% 
  mutate(across(starts_with("Drug"), ~ case_when(.x >0 ~ 0,
                                                 .x <1 ~ 1,
                                                 TRUE ~ .x))
  )

mydat %>% 
  select(-ID) %>% 
  colnames() %>% 
  combn(2) %>% 
  t() %>% 
  as_tibble() %>% 
  rowwise %>% 
  mutate(chisq_test = list(
    table(mydat[[V1]], mydat[[V2]]) %>% chisq.test()
    ),
    chisq_pval = chisq_test$p.value
    )

#> Using compatibility `.name_repair`.
#> # A tibble: 6 x 4
#> # Rowwise: 
#>   V1    V2    chisq_test chisq_pval
#>   <chr> <chr> <list>          <dbl>
#> 1 Drug1 Drug2 <htest>       0.00694
#> 2 Drug1 Drug3 <htest>       0.298  
#> 3 Drug1 Drug4 <htest>       0.926  
#> 4 Drug2 Drug3 <htest>       0.998  
#> 5 Drug2 Drug4 <htest>       0.574  
#> 6 Drug3 Drug4 <htest>       0.895

^{由 reprex package (v2.0.1)}

于 2022-04-04 创建

下面是我的旧答案，它比较了每种药物中 0 和 1 的分布，这不是 OP 所要求的，正如@KU99 在评论中正确指出的那样。

library(tibble) # for reading in your data

mydat <-
  tribble(~ID, ~Drug1,  ~Drug2, ~Drug3,  ~Drug4,
           1, 1,      0,      0,      0,  
           2, 0,      0,      0,      1,  
           3, 0,      1,      0,      0,  
           4, 0,      0,      1,      0,  
           5, 1,      0,      0,      0
  )

lapply(mydat[, -1], function(x) chisq.test(table(x)))

#> $Drug1
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 0.2, df = 1, p-value = 0.6547
#> 
#> 
#> $Drug2
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#> 
#> 
#> $Drug3
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797
#> 
#> 
#> $Drug4
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  table(x)
#> X-squared = 1.8, df = 1, p-value = 0.1797

^{由 reprex package (v0.3.0)}

创建于 2022-03-29

R中的多个卡方检验

Multiple chi-square tests in R

r

stat

chi-squared

bonferroni