如何通过多个变量(列)将长格式数据转换为宽格式数据并相互堆叠?
How to convert long format to wide format data over multiple variable (Column) and be stacked onto each other?
我有20个站的月度时间序列数据(1987-2017)。我想将长格式数据转换为宽格式数据,以便覆盖 20 个站的所有数据都在一个数据帧中。
head(Monthly_rainfall2[-1:-20,1:5]) #long format data
# A tibble: 6 x 5
Year Month stn1 stn2 stn3
<chr> <ord> <dbl> <dbl> <dbl>
1 1987 Jan NA NA 0
2 1987 Feb NA NA 60.5
3 1987 Mar NA NA 66
4 1987 Apr NA NA 64
5 1987 May NA NA 183.
6 1987 Jun NA NA 216
请注意,月份列是有序因子。
dput(Monthly_rainfall2[21:50,1:4])
structure(list(Year = c("1987", "1987", "1987", "1987", "1987",
"1987", "1987", "1987", "1987", "1987", "1987", "1987", "1988",
"1988", "1988", "1988", "1988", "1988", "1988", "1988", "1988",
"1988", "1988", "1988", "1989", "1989", "1989", "1989", "1989",
"1989"), Month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Jan", "Feb", "Mar",
"Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
), class = c("ordered", "factor")), stn1 = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), stn2 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -30L))
我试过下面的代码
library(tidyr)
wide_data <- spread(Monthly_rainfall2[1:3], Month, stn1 )
上面的代码提供了我想要的,但是我无法在一个数据帧中同时为所有站生成
我希望我的数据框看起来像这样,其中 stn2 在 stn1 之后立即开始,随后所有站点都遵循相同的模式
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#stn1
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
#stn2
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
#stn3
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
我们可以 gather
转换成 'long' 格式,然后再做一个 spread
library(tidyverse)
df1 %>%
gather(key, val, Bhur:Chamkhar) %>%
spread(Month, val)
注意:OP 的 dput
只有 "Bhur/Chamkhar" 列的 NA
使用 data.table
的解决方案
DT <- as.data.table(df)
#filling all ne NAs for optical rasons
DT[, c("stn1", "stn2") := .(sample(1:100, 30), sample(1:100, 30))]
dcast(DT, Month ~ Year, value.var = c("stn1", "stn2")) %>%
melt(id.vars = 1, variable.name = "year") %>%
dcast(year ~ Month, value.var = "value") -> DT2
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1: stn1_1987 98 5 89 84 75 36 4 34 26 78 33 28
2: stn1_1988 67 74 40 63 9 19 79 61 66 93 47 62
3: stn1_1989 68 29 7 46 54 87 NA NA NA NA NA NA
4: stn2_1987 31 61 74 89 46 54 70 80 84 6 96 32
5: stn2_1988 75 71 11 99 20 7 77 13 52 14 2 41
6: stn2_1989 83 22 97 43 59 15 NA NA NA NA NA NA
如果你想year
和stn
分开你可以这样做:
DT2[, c("stn", "year") := tstrsplit(year, "_", fixed=TRUE)]
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec stn
1: 1987 98 5 89 84 75 36 4 34 26 78 33 28 stn1
2: 1988 67 74 40 63 9 19 79 61 66 93 47 62 stn1
3: 1989 68 29 7 46 54 87 NA NA NA NA NA NA stn1
4: 1987 31 61 74 89 46 54 70 80 84 6 96 32 stn2
5: 1988 75 71 11 99 20 7 77 13 52 14 2 41 stn2
6: 1989 83 22 97 43 59 15 NA NA NA NA NA NA stn2
我有20个站的月度时间序列数据(1987-2017)。我想将长格式数据转换为宽格式数据,以便覆盖 20 个站的所有数据都在一个数据帧中。
head(Monthly_rainfall2[-1:-20,1:5]) #long format data
# A tibble: 6 x 5
Year Month stn1 stn2 stn3
<chr> <ord> <dbl> <dbl> <dbl>
1 1987 Jan NA NA 0
2 1987 Feb NA NA 60.5
3 1987 Mar NA NA 66
4 1987 Apr NA NA 64
5 1987 May NA NA 183.
6 1987 Jun NA NA 216
请注意,月份列是有序因子。
dput(Monthly_rainfall2[21:50,1:4])
structure(list(Year = c("1987", "1987", "1987", "1987", "1987",
"1987", "1987", "1987", "1987", "1987", "1987", "1987", "1988",
"1988", "1988", "1988", "1988", "1988", "1988", "1988", "1988",
"1988", "1988", "1988", "1989", "1989", "1989", "1989", "1989",
"1989"), Month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Jan", "Feb", "Mar",
"Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
), class = c("ordered", "factor")), stn1 = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), stn2 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -30L))
我试过下面的代码
library(tidyr)
wide_data <- spread(Monthly_rainfall2[1:3], Month, stn1 )
上面的代码提供了我想要的,但是我无法在一个数据帧中同时为所有站生成
我希望我的数据框看起来像这样,其中 stn2 在 stn1 之后立即开始,随后所有站点都遵循相同的模式
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#stn1
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
#stn2
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
#stn3
1987 0.8 0.5 0.8 2 20 25 30 30 21 22 3 0
1988 1 1.2 1.8 2 20 22 25 21 15 12 10 9
...
2017 0.5 1 14 19 17 14 15 13 10 14 18 10
我们可以 gather
转换成 'long' 格式,然后再做一个 spread
library(tidyverse)
df1 %>%
gather(key, val, Bhur:Chamkhar) %>%
spread(Month, val)
注意:OP 的 dput
只有 "Bhur/Chamkhar" 列的 NA
使用 data.table
DT <- as.data.table(df)
#filling all ne NAs for optical rasons
DT[, c("stn1", "stn2") := .(sample(1:100, 30), sample(1:100, 30))]
dcast(DT, Month ~ Year, value.var = c("stn1", "stn2")) %>%
melt(id.vars = 1, variable.name = "year") %>%
dcast(year ~ Month, value.var = "value") -> DT2
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1: stn1_1987 98 5 89 84 75 36 4 34 26 78 33 28
2: stn1_1988 67 74 40 63 9 19 79 61 66 93 47 62
3: stn1_1989 68 29 7 46 54 87 NA NA NA NA NA NA
4: stn2_1987 31 61 74 89 46 54 70 80 84 6 96 32
5: stn2_1988 75 71 11 99 20 7 77 13 52 14 2 41
6: stn2_1989 83 22 97 43 59 15 NA NA NA NA NA NA
如果你想year
和stn
分开你可以这样做:
DT2[, c("stn", "year") := tstrsplit(year, "_", fixed=TRUE)]
year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec stn
1: 1987 98 5 89 84 75 36 4 34 26 78 33 28 stn1
2: 1988 67 74 40 63 9 19 79 61 66 93 47 62 stn1
3: 1989 68 29 7 46 54 87 NA NA NA NA NA NA stn1
4: 1987 31 61 74 89 46 54 70 80 84 6 96 32 stn2
5: 1988 75 71 11 99 20 7 77 13 52 14 2 41 stn2
6: 1989 83 22 97 43 59 15 NA NA NA NA NA NA stn2