使用创建的任意数量的列撤消重塑
Undo reshape with arbitrary number of columns created
我想在通过生成单个变量的编号版本将长数据帧转换为宽格式后撤消 reshape
。当有多个关键变量和多个需要重新组合的变量时,我面临的挑战是这样做。我尝试使用 gather
从 tidyr
无济于事。以长数据为例:
toy = data.frame(
first_key = rep(c("A", "B", "C"), each = 6),
second_key = rep(rep(c(1:2), each = 3), 3),
colors = c("red", "yellow", "green", "blue", "purple", "beige"),
days = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
index = c(1:3)
)
给出以下 data.frame:
first_key second_key colors days index
A 1 red Monday 1
A 1 yellow Tuesday 2
A 1 green Wednesday 3
A 2 blue Thursday 1
A 2 purple Friday 2
A 2 beige Saturday 3
B 1 red Monday 1
B 1 yellow Tuesday 2
B 1 green Wednesday 3
B 2 blue Thursday 1
B 2 purple Friday 2
B 2 beige Saturday 3
C 1 red Monday 1
C 1 yellow Tuesday 2
C 1 green Wednesday 3
C 2 blue Thursday 1
C 2 purple Friday 2
C 2 beige Saturday 3
使用变量的编号版本将其重塑为宽格式,如下所示:
toy_wide = reshape(toy, idvar = c("first_key", "second_key"),
timevar = "index", direction = "wide", sep = "_")
并给出这种宽格式:
first_key second_key colors_1 days_1 colors_2 days_2 colors_3 days_3
A 1 red Monday yellow Tuesday green Wednesday
A 2 blue Thursday purple Friday beige Saturday
B 1 red Monday yellow Tuesday green Wednesday
B 2 blue Thursday purple Friday beige Saturday
C 1 red Monday yellow Tuesday green Wednesday
C 2 blue Thursday purple Friday beige Saturday
但是我怎样才能把它恢复到原来的格式呢?我尝试了以下但出现错误。
tidyr::gather(toy_wide, key = c("first_key", "second_key"), value = c("days", "colors"),
colors_1:days_3, factor_key = TRUE)
Error: Invalid column specification
如果你用reshape
走宽,再用reshape
走多:
reshape(toy_wide, idvar = c("first_key", "second_key"), timevar="index",
varying=3:8, direction="long", sep="_")
# first_key second_key index colors days
#A.1.1 A 1 1 red Monday
#A.2.1 A 2 1 blue Thursday
# ...
如果您指定 varying=
变量集(可以是列值列表 3:8
、要删除的列值 -(1:2)
或作为字符向量的列名称 c("a","b")
) 和 sep=
然后 reshape
将能够适当地猜测输出变量名称。
分多个步骤进行这些类型的重塑通常有助于保持清晰并更好地自动化:
ids <- c("first_key", "second_key")
reshape(toy_wide, idvar=ids, timevar="index",
varying=setdiff(names(toy_wide), ids), direction="long", sep="_")
这是 data.table
中带有 melt
的另一个选项,它可以使用多个 measure
patterns
.
library(data.table)
melt(setDT(toy_wide), measure = patterns("^colors", "^days"),
value.name = c("colors", "days"), variable.name = "index")[order(first_key, second_key)]
我想在通过生成单个变量的编号版本将长数据帧转换为宽格式后撤消 reshape
。当有多个关键变量和多个需要重新组合的变量时,我面临的挑战是这样做。我尝试使用 gather
从 tidyr
无济于事。以长数据为例:
toy = data.frame(
first_key = rep(c("A", "B", "C"), each = 6),
second_key = rep(rep(c(1:2), each = 3), 3),
colors = c("red", "yellow", "green", "blue", "purple", "beige"),
days = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
index = c(1:3)
)
给出以下 data.frame:
first_key second_key colors days index
A 1 red Monday 1
A 1 yellow Tuesday 2
A 1 green Wednesday 3
A 2 blue Thursday 1
A 2 purple Friday 2
A 2 beige Saturday 3
B 1 red Monday 1
B 1 yellow Tuesday 2
B 1 green Wednesday 3
B 2 blue Thursday 1
B 2 purple Friday 2
B 2 beige Saturday 3
C 1 red Monday 1
C 1 yellow Tuesday 2
C 1 green Wednesday 3
C 2 blue Thursday 1
C 2 purple Friday 2
C 2 beige Saturday 3
使用变量的编号版本将其重塑为宽格式,如下所示:
toy_wide = reshape(toy, idvar = c("first_key", "second_key"),
timevar = "index", direction = "wide", sep = "_")
并给出这种宽格式:
first_key second_key colors_1 days_1 colors_2 days_2 colors_3 days_3
A 1 red Monday yellow Tuesday green Wednesday
A 2 blue Thursday purple Friday beige Saturday
B 1 red Monday yellow Tuesday green Wednesday
B 2 blue Thursday purple Friday beige Saturday
C 1 red Monday yellow Tuesday green Wednesday
C 2 blue Thursday purple Friday beige Saturday
但是我怎样才能把它恢复到原来的格式呢?我尝试了以下但出现错误。
tidyr::gather(toy_wide, key = c("first_key", "second_key"), value = c("days", "colors"),
colors_1:days_3, factor_key = TRUE)
Error: Invalid column specification
如果你用reshape
走宽,再用reshape
走多:
reshape(toy_wide, idvar = c("first_key", "second_key"), timevar="index",
varying=3:8, direction="long", sep="_")
# first_key second_key index colors days
#A.1.1 A 1 1 red Monday
#A.2.1 A 2 1 blue Thursday
# ...
如果您指定 varying=
变量集(可以是列值列表 3:8
、要删除的列值 -(1:2)
或作为字符向量的列名称 c("a","b")
) 和 sep=
然后 reshape
将能够适当地猜测输出变量名称。
分多个步骤进行这些类型的重塑通常有助于保持清晰并更好地自动化:
ids <- c("first_key", "second_key")
reshape(toy_wide, idvar=ids, timevar="index",
varying=setdiff(names(toy_wide), ids), direction="long", sep="_")
这是 data.table
中带有 melt
的另一个选项,它可以使用多个 measure
patterns
.
library(data.table)
melt(setDT(toy_wide), measure = patterns("^colors", "^days"),
value.name = c("colors", "days"), variable.name = "index")[order(first_key, second_key)]