如何使用函数 `gather`(或类似函数)重组数据以将四个变量减少为两个
How to reorganize data with the function `gather` (or similar) to reduce four variables to two
我有数据框 df1
,它总结了每 6 小时间隔和每个区域(mean_A
和 mean_B
)的平均动物数量。我也有这种方法的标准错误(Se_A
和 Se_B
)。例如:
df1<-data.frame(Hour=c(0,6,12,18,24),
mean_A= c(7.3,6.8,8.9,3.4,12.1),
mean_B=c(6.3,8.2,3.1,4.8,13.2),
Se_A=c(1.3,2.1,0.9,3.2,0.8),
Se_B=c(0.9,0.3,1.8,1.1,1.3))
> df1
Hour mean_A mean_B Se_A Se_B
1 0 7.3 6.3 1.3 0.9
2 6 6.8 8.2 2.1 0.3
3 12 8.9 3.1 0.9 1.8
4 18 3.4 4.8 3.2 1.1
5 24 12.1 13.2 0.8 1.3
出于绘图原因,我需要重新组织数据框。我需要的是这个(或类似的):
> df1
Hour meanType meanValue Se
1 0 mean_A 7.3 1.3
2 6 mean_A 6.8 2.1
3 12 mean_A 8.9 0.9
4 18 mean_A 3.4 3.2
5 24 mean_A 12.1 0.8
6 0 mean_B 6.3 0.9
7 6 mean_B 8.2 0.3
8 12 mean_B 3.1 1.8
9 18 mean_B 4.8 1.1
10 24 mean_B 13.2 1.3
有人知道怎么做吗?
我们可以使用 data.table
中的 melt
,这将使它变得更容易,因为它是 in-built 使用多个 measure
patterns
来创建单独的列时从 'wide' 重塑为 'long'
library(data.table)
melt(setDT(df1), measure = patterns("^mean", "^Se"),
variable.name = "meanType", value.name = c("meanValue", "Se"))[,
meanType := names(df1)[2:3][meanType]][]
# Hour meanType meanValue Se
# 1: 0 mean_A 7.3 1.3
# 2: 6 mean_A 6.8 2.1
# 3: 12 mean_A 8.9 0.9
# 4: 18 mean_A 3.4 3.2
# 5: 24 mean_A 12.1 0.8
# 6: 0 mean_B 6.3 0.9
# 7: 6 mean_B 8.2 0.3
# 8: 12 mean_B 3.1 1.8
# 9: 18 mean_B 4.8 1.1
#10: 24 mean_B 13.2 1.3
如果我们需要 tidyverse
方法
library(tidyversse)
gather(df1, meanType, val, -Hour) %>%
separate(meanType, into = c("meanType1", "meanType")) %>%
spread(meanType1, val) %>%
mutate(meanType = str_c("mean_", meanType)) %>%
arrange(meanType)
# Hour meanType mean Se
#1 0 mean_A 7.3 1.3
#2 6 mean_A 6.8 2.1
#3 12 mean_A 8.9 0.9
#4 18 mean_A 3.4 3.2
#5 24 mean_A 12.1 0.8
#6 0 mean_B 6.3 0.9
#7 6 mean_B 8.2 0.3
#8 12 mean_B 3.1 1.8
#9 18 mean_B 4.8 1.1
#10 24 mean_B 13.2 1.3
注意:gather
也适用于此,但请确保在执行 gather
之前检查列的 type
。由于两列都是数字类型,所以这不是问题。当我们有多个类型时,如果我们 gather
到单个列中,那么我们可能需要 type_convert
(从 readr
)在 spread
步骤之后
使用reshape
reshape(df1, idvar = "Hour", varying = 2:5, direction = "long", sep = "_", timevar = "type")
# Hour type mean Se
#0.A 0 A 7.3 1.3
#6.A 6 A 6.8 2.1
#12.A 12 A 8.9 0.9
#18.A 18 A 3.4 3.2
#24.A 24 A 12.1 0.8
#0.B 0 B 6.3 0.9
#6.B 6 B 8.2 0.3
#12.B 12 B 3.1 1.8
#18.B 18 B 4.8 1.1
#24.B 24 B 13.2 1.3
我们也可以使用tidyr
的pivot_longer
(版本0.8.3.9000)
library(tidyr)
pivot_longer(df1, cols = -Hour, names_to = c(".value", "Type"), names_sep = "_")
# A tibble: 10 x 4
# Hour Type mean Se
# <dbl> <chr> <dbl> <dbl>
# 1 0 A 7.3 1.3
# 2 0 B 6.3 0.9
# 3 6 A 6.8 2.1
# 4 6 B 8.2 0.3
# 5 12 A 8.9 0.9
# 6 12 B 3.1 1.8
# 7 18 A 3.4 3.2
# 8 18 B 4.8 1.1
# 9 24 A 12.1 0.8
#10 24 B 13.2 1.3
来自vignette:
Note the special variable name .value
: this tells pivot_longer()
that that component of the variable name defines the name of the output value column.
我有数据框 df1
,它总结了每 6 小时间隔和每个区域(mean_A
和 mean_B
)的平均动物数量。我也有这种方法的标准错误(Se_A
和 Se_B
)。例如:
df1<-data.frame(Hour=c(0,6,12,18,24),
mean_A= c(7.3,6.8,8.9,3.4,12.1),
mean_B=c(6.3,8.2,3.1,4.8,13.2),
Se_A=c(1.3,2.1,0.9,3.2,0.8),
Se_B=c(0.9,0.3,1.8,1.1,1.3))
> df1
Hour mean_A mean_B Se_A Se_B
1 0 7.3 6.3 1.3 0.9
2 6 6.8 8.2 2.1 0.3
3 12 8.9 3.1 0.9 1.8
4 18 3.4 4.8 3.2 1.1
5 24 12.1 13.2 0.8 1.3
出于绘图原因,我需要重新组织数据框。我需要的是这个(或类似的):
> df1
Hour meanType meanValue Se
1 0 mean_A 7.3 1.3
2 6 mean_A 6.8 2.1
3 12 mean_A 8.9 0.9
4 18 mean_A 3.4 3.2
5 24 mean_A 12.1 0.8
6 0 mean_B 6.3 0.9
7 6 mean_B 8.2 0.3
8 12 mean_B 3.1 1.8
9 18 mean_B 4.8 1.1
10 24 mean_B 13.2 1.3
有人知道怎么做吗?
我们可以使用 data.table
中的 melt
,这将使它变得更容易,因为它是 in-built 使用多个 measure
patterns
来创建单独的列时从 'wide' 重塑为 'long'
library(data.table)
melt(setDT(df1), measure = patterns("^mean", "^Se"),
variable.name = "meanType", value.name = c("meanValue", "Se"))[,
meanType := names(df1)[2:3][meanType]][]
# Hour meanType meanValue Se
# 1: 0 mean_A 7.3 1.3
# 2: 6 mean_A 6.8 2.1
# 3: 12 mean_A 8.9 0.9
# 4: 18 mean_A 3.4 3.2
# 5: 24 mean_A 12.1 0.8
# 6: 0 mean_B 6.3 0.9
# 7: 6 mean_B 8.2 0.3
# 8: 12 mean_B 3.1 1.8
# 9: 18 mean_B 4.8 1.1
#10: 24 mean_B 13.2 1.3
如果我们需要 tidyverse
方法
library(tidyversse)
gather(df1, meanType, val, -Hour) %>%
separate(meanType, into = c("meanType1", "meanType")) %>%
spread(meanType1, val) %>%
mutate(meanType = str_c("mean_", meanType)) %>%
arrange(meanType)
# Hour meanType mean Se
#1 0 mean_A 7.3 1.3
#2 6 mean_A 6.8 2.1
#3 12 mean_A 8.9 0.9
#4 18 mean_A 3.4 3.2
#5 24 mean_A 12.1 0.8
#6 0 mean_B 6.3 0.9
#7 6 mean_B 8.2 0.3
#8 12 mean_B 3.1 1.8
#9 18 mean_B 4.8 1.1
#10 24 mean_B 13.2 1.3
注意:gather
也适用于此,但请确保在执行 gather
之前检查列的 type
。由于两列都是数字类型,所以这不是问题。当我们有多个类型时,如果我们 gather
到单个列中,那么我们可能需要 type_convert
(从 readr
)在 spread
步骤之后
使用reshape
reshape(df1, idvar = "Hour", varying = 2:5, direction = "long", sep = "_", timevar = "type")
# Hour type mean Se
#0.A 0 A 7.3 1.3
#6.A 6 A 6.8 2.1
#12.A 12 A 8.9 0.9
#18.A 18 A 3.4 3.2
#24.A 24 A 12.1 0.8
#0.B 0 B 6.3 0.9
#6.B 6 B 8.2 0.3
#12.B 12 B 3.1 1.8
#18.B 18 B 4.8 1.1
#24.B 24 B 13.2 1.3
我们也可以使用tidyr
的pivot_longer
(版本0.8.3.9000)
library(tidyr)
pivot_longer(df1, cols = -Hour, names_to = c(".value", "Type"), names_sep = "_")
# A tibble: 10 x 4
# Hour Type mean Se
# <dbl> <chr> <dbl> <dbl>
# 1 0 A 7.3 1.3
# 2 0 B 6.3 0.9
# 3 6 A 6.8 2.1
# 4 6 B 8.2 0.3
# 5 12 A 8.9 0.9
# 6 12 B 3.1 1.8
# 7 18 A 3.4 3.2
# 8 18 B 4.8 1.1
# 9 24 A 12.1 0.8
#10 24 B 13.2 1.3
来自vignette:
Note the special variable name
.value
: this tellspivot_longer()
that that component of the variable name defines the name of the output value column.