如何使用函数 `gather`（或类似函数）重组数据以将四个变量减少为两个

Question

我有数据框 df1，它总结了每 6 小时间隔和每个区域（mean_A 和 mean_B）的平均动物数量。我也有这种方法的标准错误（Se_A 和 Se_B）。例如：

df1<-data.frame(Hour=c(0,6,12,18,24),
                mean_A= c(7.3,6.8,8.9,3.4,12.1),
                mean_B=c(6.3,8.2,3.1,4.8,13.2),
                Se_A=c(1.3,2.1,0.9,3.2,0.8),
                Se_B=c(0.9,0.3,1.8,1.1,1.3))

> df1

  Hour mean_A mean_B Se_A Se_B
1    0    7.3    6.3  1.3  0.9
2    6    6.8    8.2  2.1  0.3
3   12    8.9    3.1  0.9  1.8
4   18    3.4    4.8  3.2  1.1
5   24   12.1   13.2  0.8  1.3

出于绘图原因，我需要重新组织数据框。我需要的是这个（或类似的）：

> df1
   Hour meanType meanValue  Se
1     0   mean_A       7.3 1.3
2     6   mean_A       6.8 2.1
3    12   mean_A       8.9 0.9
4    18   mean_A       3.4 3.2
5    24   mean_A      12.1 0.8
6     0   mean_B       6.3 0.9
7     6   mean_B       8.2 0.3
8    12   mean_B       3.1 1.8
9    18   mean_B       4.8 1.1
10   24   mean_B      13.2 1.3

有人知道怎么做吗？

Answer 1

我们可以使用 data.table 中的 melt ，这将使它变得更容易，因为它是 in-built 使用多个 measure patterns 来创建单独的列时从 'wide' 重塑为 'long'

library(data.table)
melt(setDT(df1), measure = patterns("^mean", "^Se"), 
      variable.name = "meanType", value.name = c("meanValue", "Se"))[,
        meanType := names(df1)[2:3][meanType]][]
#    Hour meanType meanValue  Se
# 1:    0   mean_A       7.3 1.3
# 2:    6   mean_A       6.8 2.1
# 3:   12   mean_A       8.9 0.9
# 4:   18   mean_A       3.4 3.2
# 5:   24   mean_A      12.1 0.8
# 6:    0   mean_B       6.3 0.9
# 7:    6   mean_B       8.2 0.3
# 8:   12   mean_B       3.1 1.8
# 9:   18   mean_B       4.8 1.1
#10:   24   mean_B      13.2 1.3

如果我们需要 tidyverse 方法

library(tidyversse)
gather(df1, meanType, val, -Hour) %>% 
   separate(meanType, into = c("meanType1", "meanType")) %>%  
   spread(meanType1, val) %>%
   mutate(meanType = str_c("mean_", meanType)) %>%
   arrange(meanType)
#   Hour meanType mean  Se
#1     0   mean_A  7.3 1.3
#2     6   mean_A  6.8 2.1
#3    12   mean_A  8.9 0.9
#4    18   mean_A  3.4 3.2
#5    24   mean_A 12.1 0.8
#6     0   mean_B  6.3 0.9
#7     6   mean_B  8.2 0.3
#8    12   mean_B  3.1 1.8
#9    18   mean_B  4.8 1.1
#10   24   mean_B 13.2 1.3

注意：gather 也适用于此，但请确保在执行 gather 之前检查列的 type。由于两列都是数字类型，所以这不是问题。当我们有多个类型时，如果我们 gather 到单个列中，那么我们可能需要 type_convert（从 readr）在 spread 步骤之后

Answer 2

使用reshape

reshape(df1, idvar = "Hour", varying = 2:5, direction = "long", sep = "_", timevar = "type")
#     Hour type mean  Se
#0.A     0    A  7.3 1.3
#6.A     6    A  6.8 2.1
#12.A   12    A  8.9 0.9
#18.A   18    A  3.4 3.2
#24.A   24    A 12.1 0.8
#0.B     0    B  6.3 0.9
#6.B     6    B  8.2 0.3
#12.B   12    B  3.1 1.8
#18.B   18    B  4.8 1.1
#24.B   24    B 13.2 1.3

我们也可以使用tidyr的pivot_longer（版本0.8.3.9000）

library(tidyr)
pivot_longer(df1, cols = -Hour, names_to = c(".value", "Type"), names_sep = "_")
# A tibble: 10 x 4
#    Hour Type   mean    Se
#   <dbl> <chr> <dbl> <dbl>
# 1     0 A       7.3   1.3
# 2     0 B       6.3   0.9
# 3     6 A       6.8   2.1
# 4     6 B       8.2   0.3
# 5    12 A       8.9   0.9
# 6    12 B       3.1   1.8
# 7    18 A       3.4   3.2
# 8    18 B       4.8   1.1
# 9    24 A      12.1   0.8
#10    24 B      13.2   1.3

来自vignette：

Note the special variable name .value: this tells pivot_longer() that that component of the variable name defines the name of the output value column.

如何使用函数 `gather`（或类似函数）重组数据以将四个变量减少为两个

How to reorganize data with the function `gather` (or similar) to reduce four variables to two

r

reshape

dataframe

data.table

tidyr