从长到宽 - 将多列转换为多个新列 (2>>6)

Long to Wide - translating multiple columns into multiple new column (2>>6)

我有一个 "long" 格式的数据集,我想将其更改为 "wide" 格式。我想按一组列分组并将剩余的列分组到相应的对中。我想我知道当它只有一列 'widened' 时如何从长到宽,但是当我同时需要多个长加宽时我无法让它工作。

请查看示例以了解所需的起点和终点。

开始:

   structure(list(gender = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L
   ), .Label = c("female", "male"), class = "factor"), state =     structure(c(3L, 
   3L, 3L, 1L, 1L, 1L, 2L), .Label = c("ca", "ny", "tx"), class = "factor"), 
name = structure(c(3L, 5L, 7L, 6L, 1L, 2L, 4L), .Label = c("ashley", 
"jackie", "john", "luke", "mark", "mary", "rob"), class = "factor"), 
value = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)), .Names = c("gender", 
 "state", "name", "value"), class = "data.frame", row.names = c(NA, 
-7L))

结束:

  structure(list(gender = structure(c(2L, 1L, 2L), .Label = c("female", 
"male"), class = "factor"), state = structure(c(3L, 1L, 2L), .Label = c("ca", 
"ny", "tx"), class = "factor"), value1 = c(1L, 1L, 1L), name1 = structure(c(1L, 
3L, 2L), .Label = c("john", "luke", "mary"), class = "factor"), 
value2 = c(2L, 2L, NA), name2 = structure(c(2L, 1L, NA), .Label = c("ashley", 
"mark"), class = "factor"), value3 = c(3L, 3L, NA), name3 = structure(c(2L, 
1L, NA), .Label = c("jackie", "rob"), class = "factor")), .Names = c("gender", 
"state", "value1", "name1", "value2", "name2", "value3", "name3"
), class = "data.frame", row.names = c(NA, -3L))

我们可以使用 data.table 中的 dcast,它可以包含多个 'value.var' 列。我们将 data.frame 转换为 data.table (setDT(df1)),创建一个序列列 ("N"),按 "gender" 和 "state" 分组。然后,使用 dcast 将 'long' 格式转换为 'wide' 格式。

library(data.table)
setDT(df1)[, N:= 1:.N, .(gender,state)]
dcast(df1, gender+state~N, value.var=c("name", "value"), sep="")
#    gender state name1  name2  name3 value1 value2 value3
#1: female    ca  mary ashley jackie      1      2      3
#2:   male    ny  luke     NA     NA      1     NA     NA
#3:   male    tx  john   mark    rob      1      2      3

这也可以在创建序列列后使用 base R 中的 reshape 完成

dfN <- transform(df1, N = ave(seq_along(state), 
                       gender, state, FUN=seq_along))
reshape(dfN, idvar=c('gender', 'state'), timevar= 'N', 
               direction='wide')
#  gender state name.1 value.1 name.2 value.2 name.3 value.3
#1   male    tx   john       1   mark       2    rob       3
#4 female    ca   mary       1 ashley       2 jackie       3
#7   male    ny   luke       1   <NA>      NA   <NA>      NA