基于列转置数据并保留重复数据(宽格式与长格式不太相似)
transpose data based on a column and keep duplicated data (not quite similar wide to long format)
这与从长到宽的格式略有不同。 (请不要重复报告)
我有如下数据。我想根据术语列与主题列中的相应值进行转置。结果就像 df_result:
DF <- data.frame(ID = c("10", "10", "10", "10", "10", "11", "11", "11", "12", "12"),
term = c("1", "1", "2", "2", "3", "1", "1", "2", "1", "1"),
subject = c("math1", "phys1", "math2", "chem1", "cmp1", "math1", "phys1", "math2", "math1", "phys1"),
graduation = c ("grad", "grad", "grad", "grad", "grad", "drop", "drop", "drop", "enrolled", "enrolled"))
Df
ID term subject graduation
10 1 math1 grad
10 1 phys1 grad
10 2 math2 grad
10 2 chem1 grad
10 3 cmp1 grad
11 1 math1 drop
11 1 phys1 drop
11 2 math2 drop
12 1 math1 enrolled
12 1 phys1 enrolled
Df_result:
ID term1 term2 term3 graduation
10 math1 math2 cmp1 grad
10 phys1 chem1 NA grad
11 math1 math2 NA drop
11 phys1 NA NA drop
12 math1 NA NA Enrolled
12 math2 NA NA Enrolled
使用 reshape
生成接近我想要的结果,但它只保留第一个匹配项。
resjape(DF, idvar = c("ID","graduation"), timevar = "term", direction = "wide")
它产生:
ID graduation subject.1 subject.2 subject.3
1 10 grad math1 math2 cmp1
6 11 drop math1 math2 <NA>
9 12 enrolled math1 <NA> <NA>
问题是 timevar
只保留第一个匹配项。
使用 dcast
和 melt
只用函数 length
填充数据。
我如何在 R 中解决它?
这与从长到宽的整形相同,但您需要一个新变量来帮助您唯一标识新格式中的一行。我在下面称这个变量为 classnum
,我使用 data.table
的语法来帮助我创建它:
# add helper variable "classnum"
library(data.table)
setDT(DF)
DF[ , classnum := 1:.N, by=.(ID, term)]
#reshape long-to-wide
tidyr::spread(DF, term, subject)
结果:
ID graduation classnum 1 2 3
1: 10 grad 1 math1 math2 cmp1
2: 10 grad 2 phys1 chem1 <NA>
3: 11 drop 1 math1 math2 <NA>
4: 11 drop 2 phys1 <NA> <NA>
5: 12 enrolled 1 math1 <NA> <NA>
6: 12 enrolled 2 phys1 <NA> <NA>
这与从长到宽的格式略有不同。 (请不要重复报告)
我有如下数据。我想根据术语列与主题列中的相应值进行转置。结果就像 df_result:
DF <- data.frame(ID = c("10", "10", "10", "10", "10", "11", "11", "11", "12", "12"),
term = c("1", "1", "2", "2", "3", "1", "1", "2", "1", "1"),
subject = c("math1", "phys1", "math2", "chem1", "cmp1", "math1", "phys1", "math2", "math1", "phys1"),
graduation = c ("grad", "grad", "grad", "grad", "grad", "drop", "drop", "drop", "enrolled", "enrolled"))
Df
ID term subject graduation
10 1 math1 grad
10 1 phys1 grad
10 2 math2 grad
10 2 chem1 grad
10 3 cmp1 grad
11 1 math1 drop
11 1 phys1 drop
11 2 math2 drop
12 1 math1 enrolled
12 1 phys1 enrolled
Df_result:
ID term1 term2 term3 graduation
10 math1 math2 cmp1 grad
10 phys1 chem1 NA grad
11 math1 math2 NA drop
11 phys1 NA NA drop
12 math1 NA NA Enrolled
12 math2 NA NA Enrolled
使用 reshape
生成接近我想要的结果,但它只保留第一个匹配项。
resjape(DF, idvar = c("ID","graduation"), timevar = "term", direction = "wide")
它产生:
ID graduation subject.1 subject.2 subject.3
1 10 grad math1 math2 cmp1
6 11 drop math1 math2 <NA>
9 12 enrolled math1 <NA> <NA>
问题是 timevar
只保留第一个匹配项。
使用 dcast
和 melt
只用函数 length
填充数据。
我如何在 R 中解决它?
这与从长到宽的整形相同,但您需要一个新变量来帮助您唯一标识新格式中的一行。我在下面称这个变量为 classnum
,我使用 data.table
的语法来帮助我创建它:
# add helper variable "classnum"
library(data.table)
setDT(DF)
DF[ , classnum := 1:.N, by=.(ID, term)]
#reshape long-to-wide
tidyr::spread(DF, term, subject)
结果:
ID graduation classnum 1 2 3
1: 10 grad 1 math1 math2 cmp1
2: 10 grad 2 phys1 chem1 <NA>
3: 11 drop 1 math1 math2 <NA>
4: 11 drop 2 phys1 <NA> <NA>
5: 12 enrolled 1 math1 <NA> <NA>
6: 12 enrolled 2 phys1 <NA> <NA>