从长到宽 - 将多列转换为多个新列 (2>>6)
Long to Wide - translating multiple columns into multiple new column (2>>6)
我有一个 "long" 格式的数据集,我想将其更改为 "wide" 格式。我想按一组列分组并将剩余的列分组到相应的对中。我想我知道当它只有一列 'widened' 时如何从长到宽,但是当我同时需要多个长加宽时我无法让它工作。
请查看示例以了解所需的起点和终点。
开始:
structure(list(gender = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L
), .Label = c("female", "male"), class = "factor"), state = structure(c(3L,
3L, 3L, 1L, 1L, 1L, 2L), .Label = c("ca", "ny", "tx"), class = "factor"),
name = structure(c(3L, 5L, 7L, 6L, 1L, 2L, 4L), .Label = c("ashley",
"jackie", "john", "luke", "mark", "mary", "rob"), class = "factor"),
value = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)), .Names = c("gender",
"state", "name", "value"), class = "data.frame", row.names = c(NA,
-7L))
结束:
structure(list(gender = structure(c(2L, 1L, 2L), .Label = c("female",
"male"), class = "factor"), state = structure(c(3L, 1L, 2L), .Label = c("ca",
"ny", "tx"), class = "factor"), value1 = c(1L, 1L, 1L), name1 = structure(c(1L,
3L, 2L), .Label = c("john", "luke", "mary"), class = "factor"),
value2 = c(2L, 2L, NA), name2 = structure(c(2L, 1L, NA), .Label = c("ashley",
"mark"), class = "factor"), value3 = c(3L, 3L, NA), name3 = structure(c(2L,
1L, NA), .Label = c("jackie", "rob"), class = "factor")), .Names = c("gender",
"state", "value1", "name1", "value2", "name2", "value3", "name3"
), class = "data.frame", row.names = c(NA, -3L))
我们可以使用 data.table
中的 dcast
,它可以包含多个 'value.var' 列。我们将 data.frame
转换为 data.table
(setDT(df1)
),创建一个序列列 ("N"),按 "gender" 和 "state" 分组。然后,使用 dcast
将 'long' 格式转换为 'wide' 格式。
library(data.table)
setDT(df1)[, N:= 1:.N, .(gender,state)]
dcast(df1, gender+state~N, value.var=c("name", "value"), sep="")
# gender state name1 name2 name3 value1 value2 value3
#1: female ca mary ashley jackie 1 2 3
#2: male ny luke NA NA 1 NA NA
#3: male tx john mark rob 1 2 3
这也可以在创建序列列后使用 base R
中的 reshape
完成
dfN <- transform(df1, N = ave(seq_along(state),
gender, state, FUN=seq_along))
reshape(dfN, idvar=c('gender', 'state'), timevar= 'N',
direction='wide')
# gender state name.1 value.1 name.2 value.2 name.3 value.3
#1 male tx john 1 mark 2 rob 3
#4 female ca mary 1 ashley 2 jackie 3
#7 male ny luke 1 <NA> NA <NA> NA
我有一个 "long" 格式的数据集,我想将其更改为 "wide" 格式。我想按一组列分组并将剩余的列分组到相应的对中。我想我知道当它只有一列 'widened' 时如何从长到宽,但是当我同时需要多个长加宽时我无法让它工作。
请查看示例以了解所需的起点和终点。
开始:
structure(list(gender = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L
), .Label = c("female", "male"), class = "factor"), state = structure(c(3L,
3L, 3L, 1L, 1L, 1L, 2L), .Label = c("ca", "ny", "tx"), class = "factor"),
name = structure(c(3L, 5L, 7L, 6L, 1L, 2L, 4L), .Label = c("ashley",
"jackie", "john", "luke", "mark", "mary", "rob"), class = "factor"),
value = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)), .Names = c("gender",
"state", "name", "value"), class = "data.frame", row.names = c(NA,
-7L))
结束:
structure(list(gender = structure(c(2L, 1L, 2L), .Label = c("female",
"male"), class = "factor"), state = structure(c(3L, 1L, 2L), .Label = c("ca",
"ny", "tx"), class = "factor"), value1 = c(1L, 1L, 1L), name1 = structure(c(1L,
3L, 2L), .Label = c("john", "luke", "mary"), class = "factor"),
value2 = c(2L, 2L, NA), name2 = structure(c(2L, 1L, NA), .Label = c("ashley",
"mark"), class = "factor"), value3 = c(3L, 3L, NA), name3 = structure(c(2L,
1L, NA), .Label = c("jackie", "rob"), class = "factor")), .Names = c("gender",
"state", "value1", "name1", "value2", "name2", "value3", "name3"
), class = "data.frame", row.names = c(NA, -3L))
我们可以使用 data.table
中的 dcast
,它可以包含多个 'value.var' 列。我们将 data.frame
转换为 data.table
(setDT(df1)
),创建一个序列列 ("N"),按 "gender" 和 "state" 分组。然后,使用 dcast
将 'long' 格式转换为 'wide' 格式。
library(data.table)
setDT(df1)[, N:= 1:.N, .(gender,state)]
dcast(df1, gender+state~N, value.var=c("name", "value"), sep="")
# gender state name1 name2 name3 value1 value2 value3
#1: female ca mary ashley jackie 1 2 3
#2: male ny luke NA NA 1 NA NA
#3: male tx john mark rob 1 2 3
这也可以在创建序列列后使用 base R
中的 reshape
完成
dfN <- transform(df1, N = ave(seq_along(state),
gender, state, FUN=seq_along))
reshape(dfN, idvar=c('gender', 'state'), timevar= 'N',
direction='wide')
# gender state name.1 value.1 name.2 value.2 name.3 value.3
#1 male tx john 1 mark 2 rob 3
#4 female ca mary 1 ashley 2 jackie 3
#7 male ny luke 1 <NA> NA <NA> NA