传播按 ID 分组但具有不同观察结果的数据

spreading data that is grouped by ID but having different observations

我有这个数据:

drugData <- data.frame(caseID=c(9, 9, 10, 11, 12, 12, 12, 12, 13, 45, 45, 225),
            Drug=c("Cocaine", "Cocaine", "DPT", "LSD", "Cocaine", "LSD", "Heroin","Heroin", "LSD", "DPT", "DPT", "Heroin"),
             County=c("A", "A", "B", "C", "D", "D", "D","D", "E", "F", "F", "G"),
             Date=c(2009, 2009, 2009, 2009, 2011, 2011, 2011, 2011, 2010, 2010, 2010, 2005))

"CaseID"行组成一个case,可能观察到所有同一种药物,也可能观察到不同种类的药物。我希望此数据如下所示:

CaseID  Drug.1     Drug.2    Drug. 3   Drug.4    County   Date
9       Cocaine    Cocaine   NA        NA        A        2009
10      DPT        LSD       NA        NA        B        2009
11      LSD        NA        NA        NA        C        2009
12      Cocaine    LSD       Heroin    Heroin    D        2011
13      LSD        NA        NA        NA        E        2010
45      DPT        DPT       NA        NA        F        2010
225     Heroin     NA        NA        NA        G        2005

我试过使用 dplyr 传播函数,但似乎无法完全发挥作用。谢谢!

我们可以在基于 'caseID'

创建序列列后转向宽格式
library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
drugData %>%
   mutate(nm = str_c('Drug', rowid(caseID))) %>% 
   pivot_wider(names_from = nm, values_from = Drug) 
#A tibble: 7 x 7
#  caseID County  Date Drug1   Drug2   Drug3  Drug4 
#   <dbl> <fct>  <dbl> <fct>   <fct>   <fct>  <fct> 
#1      9 A       2009 Cocaine Cocaine <NA>   <NA>  
#2     10 B       2009 DPT     <NA>    <NA>   <NA>  
#3     11 C       2009 LSD     <NA>    <NA>   <NA>  
#4     12 D       2011 Cocaine LSD     Heroin Heroin
#5     13 E       2010 LSD     <NA>    <NA>   <NA>  
#6     45 F       2010 DPT     DPT     <NA>   <NA>  
#7    225 G       2005 Heroin  <NA>    <NA>   <NA>  

或使用 spread(不推荐使用 spread 代替 pivot_wider

drugData %>%
   mutate(nm = str_c('Drug', rowid(caseID))) %>% 
   spread(nm, Drug)

或使用data.table

dcast(setDT(drugData), caseID + County  + Date ~
         paste0('Drug', rowid(caseID)), value.var = 'Drug')
#   caseID County Date   Drug1   Drug2  Drug3  Drug4
#1:      9      A 2009 Cocaine Cocaine   <NA>   <NA>
#2:     10      B 2009     DPT    <NA>   <NA>   <NA>
#3:     11      C 2009     LSD    <NA>   <NA>   <NA>
#4:     12      D 2011 Cocaine     LSD Heroin Heroin
#5:     13      E 2010     LSD    <NA>   <NA>   <NA>
#6:     45      F 2010     DPT     DPT   <NA>   <NA>
#7:    225      G 2005  Heroin    <NA>   <NA>   <NA>