使用创建的任意数量的列撤消重塑

Undo reshape with arbitrary number of columns created

我想在通过生成单个变量的编号版本将长数据帧转换为宽格式后撤消 reshape。当有多个关键变量和多个需要重新组合的变量时,我面临的挑战是这样做。我尝试使用 gathertidyr 无济于事。以长数据为例:

 toy = data.frame(
   first_key = rep(c("A", "B", "C"), each = 6),
   second_key = rep(rep(c(1:2), each = 3), 3),
   colors = c("red", "yellow", "green", "blue", "purple", "beige"),
   days = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"), 
   index = c(1:3)
 )

给出以下 data.frame:

first_key second_key colors      days index
       A          1    red    Monday     1
       A          1 yellow   Tuesday     2
       A          1  green Wednesday     3
       A          2   blue  Thursday     1
       A          2 purple    Friday     2
       A          2  beige  Saturday     3
       B          1    red    Monday     1
       B          1 yellow   Tuesday     2
       B          1  green Wednesday     3
       B          2   blue  Thursday     1
       B          2 purple    Friday     2
       B          2  beige  Saturday     3
       C          1    red    Monday     1
       C          1 yellow   Tuesday     2
       C          1  green Wednesday     3
       C          2   blue  Thursday     1
       C          2 purple    Friday     2
       C          2  beige  Saturday     3

使用变量的编号版本将其重塑为宽格式,如下所示:

toy_wide = reshape(toy, idvar = c("first_key", "second_key"),
           timevar = "index", direction = "wide", sep = "_")

并给出这种宽格式:

first_key second_key colors_1   days_1 colors_2  days_2 colors_3    days_3
       A          1      red   Monday   yellow Tuesday    green Wednesday
       A          2     blue Thursday   purple  Friday    beige  Saturday
       B          1      red   Monday   yellow Tuesday    green Wednesday
       B          2     blue Thursday   purple  Friday    beige  Saturday
       C          1      red   Monday   yellow Tuesday    green Wednesday
       C          2     blue Thursday   purple  Friday    beige  Saturday

但是我怎样才能把它恢复到原来的格式呢?我尝试了以下但出现错误。

tidyr::gather(toy_wide, key = c("first_key", "second_key"), value = c("days", "colors"),
       colors_1:days_3, factor_key = TRUE)

Error: Invalid column specification

如果你用reshape走宽,再用reshape走多:

reshape(toy_wide, idvar = c("first_key", "second_key"), timevar="index",
        varying=3:8, direction="long", sep="_")

#      first_key second_key index colors      days
#A.1.1         A          1     1    red    Monday
#A.2.1         A          2     1   blue  Thursday
# ...

如果您指定 varying= 变量集(可以是列值列表 3:8、要删除的列值 -(1:2) 或作为字符向量的列名称 c("a","b") ) 和 sep= 然后 reshape 将能够适当地猜测输出变量名称。

分多个步骤进行这些类型的重塑通常有助于保持清晰并更好地自动化:

ids <- c("first_key", "second_key")
reshape(toy_wide, idvar=ids, timevar="index",
        varying=setdiff(names(toy_wide), ids), direction="long", sep="_")

这是 data.table 中带有 melt 的另一个选项,它可以使用多个 measure patterns.

library(data.table)
melt(setDT(toy_wide), measure = patterns("^colors", "^days"), 
   value.name = c("colors", "days"), variable.name = "index")[order(first_key, second_key)]