置换一个级别内的列，对 2 列执行测试，并保存 pvalues

Question

我有一个数据框

> dput(df)
structure(list(id = c(1, 2, 3, 4, 1, 2, 3, 4), level = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("g01", "g02"), class = "factor"), 
    m_col = c(1, 2, 3, 4, 11, 22, 33, 44), u_col = c(11, 12, 
    13, 14, 21, 22, 23, 24), group = c(0, 0, 1, 1, 0, 0, 1, 1
    )), row.names = c(NA, -8L), class = "data.frame")

看起来像这样

  id level m_col u_col group
1  1   g01     1    11     0
2  2   g01     2    12     0
3  3   g01     3    13     1
4  4   g01     4    14     1
5  1   g02    11    21     0
6  2   g02    22    22     0
7  3   g02    33    23     1
8  4   g02    44    24     1

我想对每个 'level' 执行二项式加权测试（我需要比较每个 id 的 u_col 和 m_col）...所以使用 tidyverse 和 broom 我可以执行以下操作：

res <- df %>% 
  group_by(level) %>% 
  do(tidy(glm(cbind(.$m_col,.$u_col) ~ .$group, family="binomial"))) %>%
  filter(term == ".$group")

这为我提供了每个级别的一些 p 值：

> res
# A tibble: 2 x 6
# Groups:   level [2]
  level term    estimate std.error statistic p.value
  <fct> <chr>      <dbl>     <dbl>     <dbl>   <dbl>
1 g01   .$group    0.687     0.746     0.921  0.357 
2 g02   .$group    0.758     0.296     2.56   0.0105

然后我可以问有多少 p<0.05

length(which(res$p.value < 0.05)

我现在想排列数据，重复二项式检验，询问有多少 p's < 0.05 然后存储该值，然后再重复 999 次。

但是，排列需要随机排列每个 'level' 中的 'group' 列。我正在努力寻找一种方法来做到这一点，所以例如一个排列看起来像这样

  id level m_col u_col group
1  1   g01     1    11     1
2  2   g01     2    12     0
3  3   g01     3    13     1
4  4   g01     4    14     0
5  1   g02    11    21     1
6  2   g02    22    22     0
7  3   g02    33    23     1
8  4   g02    44    24     0

第二个看起来像

  id level m_col u_col group
1  1   g01     1    11     0
2  2   g01     2    12     1
3  3   g01     3    13     1
4  4   g01     4    14     0
5  1   g02    11    21     0
6  2   g02    22    22     1
7  3   g02    33    23     1
8  4   g02    44    24     0

等等

让测试依赖于 2 列会限制随机播放选项，我很难过。如果有任何建议，我将不胜感激。

Answer 1

如果你想要一个数据框，你可以试试这个：

library(tidyverse)
map_dfr(1:1000, ~ df %>%
                   group_by(level) %>%
                   mutate(group = group[sample(row_number())]) %>% # permutation shuffle the 'group' column within each 'level'. 
                   do(tidy(glm(cbind(.$m_col,.$u_col) ~ .$group, family="binomial"))) %>%
                   filter(term == ".$group") %>% 
                   ungroup() %>% 
                   summarise(sum(p.value < 0.05))) # ask how many p<0.05

如果你想要一个矢量：

map_dbl(1:1000, ~ df %>%
                   group_by(level) %>%
                   mutate(group = group[sample(row_number())]) %>% # permutation shuffle the 'group' column within each 'level'. 
                   do(tidy(glm(cbind(.$m_col,.$u_col) ~ .$group, family="binomial"))) %>%
                   filter(term == ".$group") %>% 
                   ungroup() %>% 
                   summarise(sum(p.value < 0.05)) %>% # ask how many p<0.05
                   pull())

Answer 2

你可以写一个函数:

library(dplyr)
library(broom)

apply_fun <- function(data) {
  sum(subset(tidy(glm(cbind(m_col, u_col)~group, data, 
             family="binomial")), term == 'group')$p.value < 0.05)
}

然后用replicate重复。

result <- replicate(100, df %>%
  group_by(level) %>%
  mutate(group = sample(group)) %>%
  summarise(value = apply_fun(cur_data())), simplify = FALSE)
result

置换一个级别内的列，对 2 列执行测试，并保存 pvalues

permute a column within a level, perform an test on 2 columns, and save the pvalues

r

glm

permute

broom

tidyverse