将 table 计数扩展到数据框

Expand table of counts to a dataframe

鉴于 'dat' 中指定的 table 计数,我想创建一个包含 3 列(种族、grp 和结果)和 206 行的数据框。如果确定,变量结果将为 1,如果 'missed'.

则为 0
dat <- structure(list(race = structure(c(1L, 2L, 1L, 2L), levels = c("black", 
"nonblack"), class = "factor"), grp = structure(c(1L, 1L, 2L, 
2L), levels = c("hbpm", "uc"), class = "factor"), ascertained = c(63, 
32, 24, 21), missed = c(5, 3, 49, 9), total = c(68, 35, 73, 30
)), class = "data.frame", row.names = c(NA, -4L))

这个怎么样:

library(tidyverse)
dat <- structure(list(race = structure(c(1L, 2L, 1L, 2L), levels = c("black", 
                                                                     "nonblack"), class = "factor"), grp = structure(c(1L, 1L, 2L, 
                                                                                                                       2L), levels = c("hbpm", "uc"), class = "factor"), ascertained = c(63, 
                                                                                                                                                                                         32, 24, 21), missed = c(5, 3, 49, 9), total = c(68, 35, 73, 30
                                                                                                                                                                                         )), class = "data.frame", row.names = c(NA, -4L))
dat2 <- dat %>% select(-total) %>% 
  pivot_longer(c(ascertained, missed), names_to = "var", values_to="vals") %>% 
  uncount(vals) %>% 
  mutate(outcome = case_when(var == "ascertained" ~ 1, 
                             TRUE ~ 0)) %>% 
  select(-var)
head(dat2)
#> # A tibble: 6 × 3
#>   race  grp   outcome
#>   <fct> <fct>   <dbl>
#> 1 black hbpm        1
#> 2 black hbpm        1
#> 3 black hbpm        1
#> 4 black hbpm        1
#> 5 black hbpm        1
#> 6 black hbpm        1

dat2 %>% 
  group_by(race, grp, outcome) %>% 
  tally()
#> # A tibble: 8 × 4
#> # Groups:   race, grp [4]
#>   race     grp   outcome     n
#>   <fct>    <fct>   <dbl> <int>
#> 1 black    hbpm        0     5
#> 2 black    hbpm        1    63
#> 3 black    uc          0    49
#> 4 black    uc          1    24
#> 5 nonblack hbpm        0     3
#> 6 nonblack hbpm        1    32
#> 7 nonblack uc          0     9
#> 8 nonblack uc          1    21

这部分基于评论中来自 Limey 的链接问题:

library(tidyverse)

bind_rows(
  dat %>% uncount(ascertained) %>% mutate(outcome = 1) %>% select(-missed, -total), 
  dat %>% uncount(missed) %>% mutate(outcome = 0) %>% select(-ascertained, -total)
)

1) 对于每一行,在输出中设置 race 到该 race,grp 到该组的输出,然后为结果生成适当数量的 1 和 0。结果是 206 x 3.

library(dplyr)

dat %>%
  rowwise %>%
  summarize(race = race, grp = grp, outcome = rep(1:0, c(ascertained, missed)))

2) 在示例数据中没有重复的 race/grp,如果一般情况下是这样,那么它也可以写成::

dat %>%
  group_by(race, grp) %>%
  summarize(outcome = rep(1:0, c(ascertained, missed)), .groups = "drop")

3) 基本的 R 解决方案如下。如果 race/grp 的每个组合仅出现在输入的一行上,则 1:nrow(dat) 可以选择性地替换为 dat[1:2].

do.call("rbind", 
  by(dat, 
     1:nrow(dat), 
     with, 
     data.frame(race = race, grp = grp, outcome = rep(1:0, c(ascertained, missed)))
  )
)

这是一个相对简单的答案,部分基于 the answer suggested in a comment,但适用于解决您的问题,因为您需要多个“计数”。此答案使用包 tibbledplyrtidyr 中的函数。这些都在tidyverse中。
确切的方法是创建两个 sub-lists,一个列出“已确定”,一个列出“未确定”,根据需要格式化已确定的列,然后将这两个与基本 tibble::add_row.
相关代码为:

 library(tidyverse)
    dat2 <- uncount(dat, ascertained, .remove = F) %>%
            mutate(ascertained = 1) %>%
            select(-missed)
    dat3 <- uncount(dat, missed, .remove = T) %>% 
            mutate(ascertained = 0)
    dat4 <- add_row(dat2, dat3) %>% select(-total) %>%
            rename(outcome = ascertained)  

dat4 应该是您要求的数据。我建议还生成一个 id 列以使事情更容易处理,但显然这取决于你。