基于列转置数据并保留重复数据（宽格式与长格式不太相似）

Question

这与从长到宽的格式略有不同。（请不要重复报告）

我有如下数据。我想根据术语列与主题列中的相应值进行转置。结果就像 df_result:

DF <- data.frame(ID = c("10", "10", "10", "10", "10", "11", "11", "11", "12", "12"),
             term = c("1", "1", "2", "2", "3", "1", "1", "2", "1", "1"),
             subject = c("math1", "phys1", "math2", "chem1", "cmp1", "math1", "phys1", "math2", "math1", "phys1"),
             graduation = c ("grad", "grad", "grad", "grad", "grad", "drop", "drop", "drop", "enrolled", "enrolled"))

Df

ID   term   subject   graduation
10    1      math1      grad
10    1      phys1      grad
10    2      math2      grad
10    2      chem1      grad
10    3      cmp1       grad
11    1      math1      drop
11    1      phys1      drop
11    2      math2      drop
12    1      math1      enrolled
12    1      phys1      enrolled

Df_result:

ID  term1  term2  term3   graduation
10  math1  math2  cmp1     grad
10  phys1  chem1  NA       grad
11  math1  math2  NA       drop
11  phys1   NA    NA       drop
12  math1   NA    NA       Enrolled
12  math2   NA    NA       Enrolled

使用 reshape 生成接近我想要的结果，但它只保留第一个匹配项。

resjape(DF, idvar = c("ID","graduation"), timevar = "term", direction = "wide")

它产生：

  ID graduation subject.1 subject.2 subject.3
1 10       grad     math1     math2      cmp1
6 11       drop     math1     math2      <NA>
9 12   enrolled     math1      <NA>      <NA>

问题是 timevar 只保留第一个匹配项。使用 dcast 和 melt 只用函数 length 填充数据。

我如何在 R 中解决它？

Answer 1

这与从长到宽的整形相同，但您需要一个新变量来帮助您唯一标识新格式中的一行。我在下面称这个变量为 classnum，我使用 data.table 的语法来帮助我创建它：

# add helper variable "classnum"
library(data.table)
setDT(DF)
DF[ , classnum := 1:.N, by=.(ID, term)]

#reshape long-to-wide
tidyr::spread(DF, term, subject)

结果：

   ID graduation classnum     1     2    3
1: 10       grad        1 math1 math2 cmp1
2: 10       grad        2 phys1 chem1 <NA>
3: 11       drop        1 math1 math2 <NA>
4: 11       drop        2 phys1  <NA> <NA>
5: 12   enrolled        1 math1  <NA> <NA>
6: 12   enrolled        2 phys1  <NA> <NA>

基于列转置数据并保留重复数据（宽格式与长格式不太相似）

transpose data based on a column and keep duplicated data (not quite similar wide to long format)

r

reshape