在 R 中重塑铸造和熔化

Question

举个例子，假设我有以下数据框：

datas=data.frame(Variables=c("Power","Happiness","Power","Happiness"),
Country=c("France", "France", "UK", "UK"), y2000=c(1213,1872,1726,2234), y2001=c(1234,2345,6433,9082))

导致以下输出：

  Variables Country  2000  2001
1 Power     France   1213  1234
2 Happiness France   1872  2345
3 Power     UK       1726  6433
4 Happiness UK       2234  9082

我想按如下方式重塑此数据框：

  Year      Country  Power Happiness
1 2000      France    1213      1872  
2 2001      France    1234      2345
3 2000      UK        1726      2234
4 2001      UK        6433      9082

我开始时：

q2=cast(datas, Country~Variables, value="2000")

但随后出现如下错误：

Aggregation requires fun.aggregate: length used as default
Error in `[.data.frame`(sort_df(data, variables), , c(variables, "value"),  : 
  undefined columns selected

有什么建议吗？另外：我的数据框真的很大（417120 x 62）对解决方案有影响吗？

Answer 1

也许您对 tidyverse 替代方案感兴趣

library(tidyverse)
df %>%
    gather(Year, val, -Variables, -Country) %>%
    spread(Variables, val)
#  Country Year Happiness Power
#1  France 2000      1872  1213
#2  France 2001      2345  1234
#3      UK 2000      2234  1726
#4      UK 2001      9082  6433

或使用reshape2::melt和reshape2::dcast

reshape2::dcast(
    reshape2::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"),
    Country + Year ~ Variables)
#        Country Year Happiness Power
#1  France 2000      1872  1213
#2  France 2001      2345  1234
#3      UK 2000      2234  1726
#4      UK 2001      9082  6433

或（相同地）使用 data.table::melt 和 data.table::dcast

data.table::dcast(
    data.table::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"), 
    Country + Year ~ Variables)
#  Country Year Happiness Power
#1  France 2000      1872  1213
#2  France 2001      2345  1234
#3      UK 2000      2234  1726
#4      UK 2001      9082  6433

就 performance/runtime 而言，我认为 data.table 或 tidyr 解决方案是最有效的。您可以通过运行 a microbenchmark 检查一些更大的样本数据。

示例数据

df <-read.table(text =
    "  Variables Country  2000  2001
1 Power     France   1213  1234
2 Happiness France   1872  2345
3 Power     UK       1726  6433
4 Happiness UK       2234  9082", header = T)
colnames(df)[3:4] <- c("2000", "2001")

基准分析

以下结果来自 microbenchmark 四种方法的分析，基于（稍微）更大的 78x22 样本数据集。

set.seed(2017)
df <- data.frame(
    Variables = rep(c("Power", "Happiness", "something_else"), 26),
    Country = rep(LETTERS[1:26], each = 3),
    matrix(sample(10000, 20 * 26 * 3), nrow = 26 * 3))
colnames(df)[3:ncol(df)] <- 2000:2019

library(microbenchmark)
library(tidyr)

res <- microbenchmark(
    reshape2 = {
        reshape2::dcast(
            reshape2::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"),
            Country + Year ~ Variables)
    },
    tidyr = {
        df %>%
            gather(Year, val, -Variables, -Country) %>%
            spread(Variables, val)
    },
    datatable = {
        data.table::dcast(
            data.table::melt(df, id.vars = c("Country", "Variables"), variable.name = "Year"),
            Country + Year ~ Variables)
    },
    reshape = {
        reshape::cast(reshape::melt(df), Country + variable ~ Variables)
    }
)
res
#Unit: milliseconds
#      expr       min        lq      mean    median        uq       max neval
#  reshape2  3.088740  3.449686  4.313044  3.919372  5.112560  7.856902   100
#     tidyr  4.482361  4.982017  6.215872  5.771133  6.931964 28.293377   100
# datatable  3.179035  3.511542  4.861192  4.040188  5.123103 46.010810   100
#   reshape 27.371094 30.226222 32.425667 32.504644 34.118499 41.286803   100

library(ggplot2)
autoplot(res)

Answer 2

如上所述，我强烈建议使用 tidyr 而不是 reshape，或者至少使用 reshape2 而不是 reshape，因为它修复了许多性能问题reshape.

的问题

在reshape本身，你必须先融化datas

> cast(melt(datas), Country + variable ~ Variables)
Using Variables, Country as id variables
  Country variable Happiness Power
1  France    y2000      1872  1213
2  France    y2001      2345  1234
3      UK    y2000      2234  1726
4      UK    y2001      9082  6433

然后根据需要重命名和转换列。

在 reshape2 中，代码是相同的，但您将使用 dcast 而不是 cast。 tidyr，正如@Maurits Evers 上面的解决方案是一个更好的解决方案，大多数开发已经从 reshape2 转移到 tidyverse

在 R 中重塑铸造和熔化

Casting and Melting with reshape in R

r

reshape

示例数据

基准分析