按 id 和变量类型重塑数据框

Reshape data frame by id and variable type

我在重新排列以下数据框时遇到问题:

dat1 <- data.frame(
   id = rep(1, 4),
   var = paste0(rep(c("firstName",  "secondName"), each= 2), c(rep(1:2, 2))),
   value = c(1:4)
 )
dat2 <- data.frame(
   id = rep(2,3),
   var = paste0(rep(c("firstName", "secondName"), each= 2)[1:3], c(rep(1:2, 
2))[1:3]),
  value = c(5:7)
)
dat = rbind(dat1, dat2)
dat$type = gsub('[0-9]', '', dat$var)
# > dat
# id         var value
# 1  1  firstName1     1
# 2  1  firstName2     2
# 3  1 secondName1     3
# 4  1 secondName2     4
# 5  2  firstName1     5
# 6  2  firstName2     6
# 7  2 secondName1     7

我想得到以下结果:

id firstName  secondName
 1  1          3 
 1  2          4
 2  5          7
 2  6          NA

我试过了unstack(dat, form = value ~ type)但是没用。

问题已更新: firstName1 应该对应于 secondName1,所以如果我将 dat2 更改为

  dat2 <- data.frame(id = rep(2,3),var =paste0(rep(c("firstName", "secondName"), each= 2)[2:4], c(rep(1:2, 2))[2:4]),value = c(5:7))
# > dat
#    id         var value       type
# 1:  1  firstName1     1  firstName
# 2:  1  firstName2     2  firstName
# 3:  1 secondName1     3 secondName
# 4:  1 secondName2     4 secondName
# 5:  2  firstName2     5  firstName
# 6:  2 secondName1     6 secondName
# 7:  2 secondName2     7 secondName

对于id = 2,他的名字应该是c(NA, 6) 和c(5, 7)。那么如何处理这种情况呢?

尝试 dcast:

res <- data.table::dcast(
    dat,
    id  + substring(as.character(var), nchar(as.character(var))) ~ type,
    value.var = 'value')

res[2] <- NULL

# > res
#   id firstName secondName
# 1  1         1          3
# 2  1         2          4
# 3  2         5          7
# 4  2         6         NA

substring(as.character(var), nchar(as.character(var)))用于获取第二列的最后一个字符作为组变量。

我认为更好的选择是使用 data.table:

中的 rowid-函数
library(data.table)
dcast(setDT(dat), id + rowid(type) ~ type, value.var = 'value')[, type := NULL][]

给出:

   id firstName secondName
1:  1         1          3
2:  1         2          4
3:  2         5          7
4:  2         6         NA

更新后的问题:

setDT(dat)[, num := gsub('.*([0-9])', '\1', var)
           ][, dcast(.SD, id + num ~ type, value.var = 'value')
             ][, num := NULL][]

给出:

   id firstName secondName
1:  1         1          3
2:  1         2          4
3:  2        NA          6
4:  2         5          7

library(tidyr)

rbind(dat1,dat2) %>% separate(var,c("name","index"),"(?=\d+$)") %>%
spread(key=name,value=value)

结果

  id index firstName secondName
1  1     1         1          3
2  1     2         2          4
3  2     1         5          7
4  2     2         6         NA

备注

如果要删除colindex,请在最后添加%>% dplyr::select(-index)