传播按 ID 分组但具有不同观察结果的数据
spreading data that is grouped by ID but having different observations
我有这个数据:
drugData <- data.frame(caseID=c(9, 9, 10, 11, 12, 12, 12, 12, 13, 45, 45, 225),
Drug=c("Cocaine", "Cocaine", "DPT", "LSD", "Cocaine", "LSD", "Heroin","Heroin", "LSD", "DPT", "DPT", "Heroin"),
County=c("A", "A", "B", "C", "D", "D", "D","D", "E", "F", "F", "G"),
Date=c(2009, 2009, 2009, 2009, 2011, 2011, 2011, 2011, 2010, 2010, 2010, 2005))
"CaseID"行组成一个case,可能观察到所有同一种药物,也可能观察到不同种类的药物。我希望此数据如下所示:
CaseID Drug.1 Drug.2 Drug. 3 Drug.4 County Date
9 Cocaine Cocaine NA NA A 2009
10 DPT LSD NA NA B 2009
11 LSD NA NA NA C 2009
12 Cocaine LSD Heroin Heroin D 2011
13 LSD NA NA NA E 2010
45 DPT DPT NA NA F 2010
225 Heroin NA NA NA G 2005
我试过使用 dplyr 传播函数,但似乎无法完全发挥作用。谢谢!
我们可以在基于 'caseID'
创建序列列后转向宽格式
library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
drugData %>%
mutate(nm = str_c('Drug', rowid(caseID))) %>%
pivot_wider(names_from = nm, values_from = Drug)
#A tibble: 7 x 7
# caseID County Date Drug1 Drug2 Drug3 Drug4
# <dbl> <fct> <dbl> <fct> <fct> <fct> <fct>
#1 9 A 2009 Cocaine Cocaine <NA> <NA>
#2 10 B 2009 DPT <NA> <NA> <NA>
#3 11 C 2009 LSD <NA> <NA> <NA>
#4 12 D 2011 Cocaine LSD Heroin Heroin
#5 13 E 2010 LSD <NA> <NA> <NA>
#6 45 F 2010 DPT DPT <NA> <NA>
#7 225 G 2005 Heroin <NA> <NA> <NA>
或使用 spread
(不推荐使用 spread
代替 pivot_wider
drugData %>%
mutate(nm = str_c('Drug', rowid(caseID))) %>%
spread(nm, Drug)
或使用data.table
dcast(setDT(drugData), caseID + County + Date ~
paste0('Drug', rowid(caseID)), value.var = 'Drug')
# caseID County Date Drug1 Drug2 Drug3 Drug4
#1: 9 A 2009 Cocaine Cocaine <NA> <NA>
#2: 10 B 2009 DPT <NA> <NA> <NA>
#3: 11 C 2009 LSD <NA> <NA> <NA>
#4: 12 D 2011 Cocaine LSD Heroin Heroin
#5: 13 E 2010 LSD <NA> <NA> <NA>
#6: 45 F 2010 DPT DPT <NA> <NA>
#7: 225 G 2005 Heroin <NA> <NA> <NA>
我有这个数据:
drugData <- data.frame(caseID=c(9, 9, 10, 11, 12, 12, 12, 12, 13, 45, 45, 225),
Drug=c("Cocaine", "Cocaine", "DPT", "LSD", "Cocaine", "LSD", "Heroin","Heroin", "LSD", "DPT", "DPT", "Heroin"),
County=c("A", "A", "B", "C", "D", "D", "D","D", "E", "F", "F", "G"),
Date=c(2009, 2009, 2009, 2009, 2011, 2011, 2011, 2011, 2010, 2010, 2010, 2005))
"CaseID"行组成一个case,可能观察到所有同一种药物,也可能观察到不同种类的药物。我希望此数据如下所示:
CaseID Drug.1 Drug.2 Drug. 3 Drug.4 County Date
9 Cocaine Cocaine NA NA A 2009
10 DPT LSD NA NA B 2009
11 LSD NA NA NA C 2009
12 Cocaine LSD Heroin Heroin D 2011
13 LSD NA NA NA E 2010
45 DPT DPT NA NA F 2010
225 Heroin NA NA NA G 2005
我试过使用 dplyr 传播函数,但似乎无法完全发挥作用。谢谢!
我们可以在基于 'caseID'
创建序列列后转向宽格式library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
drugData %>%
mutate(nm = str_c('Drug', rowid(caseID))) %>%
pivot_wider(names_from = nm, values_from = Drug)
#A tibble: 7 x 7
# caseID County Date Drug1 Drug2 Drug3 Drug4
# <dbl> <fct> <dbl> <fct> <fct> <fct> <fct>
#1 9 A 2009 Cocaine Cocaine <NA> <NA>
#2 10 B 2009 DPT <NA> <NA> <NA>
#3 11 C 2009 LSD <NA> <NA> <NA>
#4 12 D 2011 Cocaine LSD Heroin Heroin
#5 13 E 2010 LSD <NA> <NA> <NA>
#6 45 F 2010 DPT DPT <NA> <NA>
#7 225 G 2005 Heroin <NA> <NA> <NA>
或使用 spread
(不推荐使用 spread
代替 pivot_wider
drugData %>%
mutate(nm = str_c('Drug', rowid(caseID))) %>%
spread(nm, Drug)
或使用data.table
dcast(setDT(drugData), caseID + County + Date ~
paste0('Drug', rowid(caseID)), value.var = 'Drug')
# caseID County Date Drug1 Drug2 Drug3 Drug4
#1: 9 A 2009 Cocaine Cocaine <NA> <NA>
#2: 10 B 2009 DPT <NA> <NA> <NA>
#3: 11 C 2009 LSD <NA> <NA> <NA>
#4: 12 D 2011 Cocaine LSD Heroin Heroin
#5: 13 E 2010 LSD <NA> <NA> <NA>
#6: 45 F 2010 DPT DPT <NA> <NA>
#7: 225 G 2005 Heroin <NA> <NA> <NA>