如何在 R 中生成年复一年的计算列
How to produce a year over year calculated column in R
首先是数据,然后是操作。最后,我目前使用的方法是目前还没有产生任何数据。操作是创建一个日期,然后创建一个滚动的 12 个月平均值。
Monthavg<-
c(20185,20186,20187,20188,20189,201810,201811,201812,20191,20192,20193,20194,20195,20196,
20197,20198,20199,201910,201911,201912,20201
,20202,20203,20204,20205,20206,20207
,20208,20209,202010,202011)
empavg<-c(2,4,6,7,8,10,12,14,16,18,20,22,24,26,28,30,32,36,36,38,40,42,44,46,48,48,50,52,52,54,56)
ces12f <- data.frame(Monthavg,empavg)
操纵
ces12f<- ces12f %>% mutate(year = substr(as.character(Monthavg),1,4),
month = substr(as.character(Monthavg),5,7),
date = as.Date(paste(year,month,"1",sep ="-")))
Month_ord <- order(Monthavg)
span_month=12
ces12f<-ces12f %>% mutate(ravg = zoo::rollmeanr(empavg, 12, fill = NA))
年差尝试
ces12f<- ces12f%>%
group_by(Monthavg)%>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)%>%
ungroup()
最终目标是 202011 减去 201911 或 47.5 减去 25.17 或 22.3。我在上面使用的方法只产生 NA。任何关于如何修改现有代码或简单地使用完全不同的方法的见解都将不胜感激。
这是 tidyr::extract
的一种方法。您可以使用 tidyr::complete
来确保填写任何缺失的月份:
library(tidyverse)
library(zoo)
ces12f %>%
mutate(Monthavg = as.character(Monthavg)) %>%
extract(Monthavg, into = c("Year", "Month"),
regex = "^([0-9]{4})([0-9]{1,2})$") %>%
mutate(across(Year:Month, as.integer)) %>%
arrange(Year,Month) %>%
complete(Year, Month) %>%
mutate(ravg = zoo::rollmeanr(empavg,12,NA)) %>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)
Year Month empavg ravg PreviousYear PreviousMonth AnnualDifference
1 2018 1 NA NA NA NA NA
2 2018 2 NA NA NA NA NA
3 2018 3 NA NA NA NA NA
4 2018 4 NA NA NA NA NA
5 2018 5 2 NA NA NA NA
6 2018 6 4 NA NA NA NA
7 2018 7 6 NA NA NA NA
8 2018 8 7 NA NA NA NA
9 2018 9 8 NA NA NA NA
10 2018 10 10 NA NA NA NA
11 2018 11 12 NA NA NA NA
12 2018 12 14 NA NA NA NA
13 2019 1 16 NA NA NA NA
14 2019 2 18 NA NA NA NA
15 2019 3 20 NA NA NA NA
16 2019 4 22 11.58333 NA NA NA
17 2019 5 24 13.41667 NA 11.58333 NA
18 2019 6 26 15.25000 NA 13.41667 NA
19 2019 7 28 17.08333 NA 15.25000 NA
20 2019 8 30 19.00000 NA 17.08333 NA
21 2019 9 32 21.00000 NA 19.00000 NA
22 2019 10 36 23.16667 NA 21.00000 NA
23 2019 11 36 25.16667 NA 23.16667 NA
24 2019 12 38 27.16667 NA 25.16667 NA
25 2020 1 40 29.16667 NA 27.16667 NA
26 2020 2 42 31.16667 NA 29.16667 NA
27 2020 3 44 33.16667 NA 31.16667 NA
28 2020 4 46 35.16667 11.58333 33.16667 23.58333
29 2020 5 48 37.16667 13.41667 35.16667 23.75000
30 2020 6 48 39.00000 15.25000 37.16667 23.75000
31 2020 7 50 40.83333 17.08333 39.00000 23.75000
32 2020 8 52 42.66667 19.00000 40.83333 23.66667
33 2020 9 52 44.33333 21.00000 42.66667 23.33333
34 2020 10 54 45.83333 23.16667 44.33333 22.66667
35 2020 11 56 47.50000 25.16667 45.83333 22.33333
36 2020 12 NA NA 27.16667 47.50000 NA
我比较偏执。也就是说,无论我们有多少年,即使有一个小的机会我们错过了一个月,而不是做 lag(..., 12)
是一个坏主意,更糟糕的是,因为你不会收到任何警告或错误,而且你的数据会错的。
因此,我将推荐自助加入。
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month"))
# Monthavg empavg year month date ravg lastravg
# 1 20185 2 2018 5 2018-05-01 NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667
可以验证每个lastempavg
是上一年的值,可以正常mutate
差值,或许
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month")) %>%
mutate(AnnualDifference = ravg - lastravg)
# Monthavg empavg year month date ravg lastravg AnnualDifference
# 1 20185 2 2018 5 2018-05-01 NA NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333 23.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667 23.75000
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000 23.75000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333 23.75000
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000 23.66667
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000 23.33333
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667 22.66667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667 22.33333
关于此的旁注:将年份和月份存储为 integer
可能更好,原因如下:(1) 它使此类事情变得非常容易; (2) 它保留序数,而 arrange(ces12f, month)
会愉快地将月份排序为 1、10、11、12、2 等; (3) (主观)毕竟它们确实是整数。
首先是数据,然后是操作。最后,我目前使用的方法是目前还没有产生任何数据。操作是创建一个日期,然后创建一个滚动的 12 个月平均值。
Monthavg<-
c(20185,20186,20187,20188,20189,201810,201811,201812,20191,20192,20193,20194,20195,20196,
20197,20198,20199,201910,201911,201912,20201
,20202,20203,20204,20205,20206,20207
,20208,20209,202010,202011)
empavg<-c(2,4,6,7,8,10,12,14,16,18,20,22,24,26,28,30,32,36,36,38,40,42,44,46,48,48,50,52,52,54,56)
ces12f <- data.frame(Monthavg,empavg)
操纵
ces12f<- ces12f %>% mutate(year = substr(as.character(Monthavg),1,4),
month = substr(as.character(Monthavg),5,7),
date = as.Date(paste(year,month,"1",sep ="-")))
Month_ord <- order(Monthavg)
span_month=12
ces12f<-ces12f %>% mutate(ravg = zoo::rollmeanr(empavg, 12, fill = NA))
年差尝试
ces12f<- ces12f%>%
group_by(Monthavg)%>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)%>%
ungroup()
最终目标是 202011 减去 201911 或 47.5 减去 25.17 或 22.3。我在上面使用的方法只产生 NA。任何关于如何修改现有代码或简单地使用完全不同的方法的见解都将不胜感激。
这是 tidyr::extract
的一种方法。您可以使用 tidyr::complete
来确保填写任何缺失的月份:
library(tidyverse)
library(zoo)
ces12f %>%
mutate(Monthavg = as.character(Monthavg)) %>%
extract(Monthavg, into = c("Year", "Month"),
regex = "^([0-9]{4})([0-9]{1,2})$") %>%
mutate(across(Year:Month, as.integer)) %>%
arrange(Year,Month) %>%
complete(Year, Month) %>%
mutate(ravg = zoo::rollmeanr(empavg,12,NA)) %>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)
Year Month empavg ravg PreviousYear PreviousMonth AnnualDifference
1 2018 1 NA NA NA NA NA
2 2018 2 NA NA NA NA NA
3 2018 3 NA NA NA NA NA
4 2018 4 NA NA NA NA NA
5 2018 5 2 NA NA NA NA
6 2018 6 4 NA NA NA NA
7 2018 7 6 NA NA NA NA
8 2018 8 7 NA NA NA NA
9 2018 9 8 NA NA NA NA
10 2018 10 10 NA NA NA NA
11 2018 11 12 NA NA NA NA
12 2018 12 14 NA NA NA NA
13 2019 1 16 NA NA NA NA
14 2019 2 18 NA NA NA NA
15 2019 3 20 NA NA NA NA
16 2019 4 22 11.58333 NA NA NA
17 2019 5 24 13.41667 NA 11.58333 NA
18 2019 6 26 15.25000 NA 13.41667 NA
19 2019 7 28 17.08333 NA 15.25000 NA
20 2019 8 30 19.00000 NA 17.08333 NA
21 2019 9 32 21.00000 NA 19.00000 NA
22 2019 10 36 23.16667 NA 21.00000 NA
23 2019 11 36 25.16667 NA 23.16667 NA
24 2019 12 38 27.16667 NA 25.16667 NA
25 2020 1 40 29.16667 NA 27.16667 NA
26 2020 2 42 31.16667 NA 29.16667 NA
27 2020 3 44 33.16667 NA 31.16667 NA
28 2020 4 46 35.16667 11.58333 33.16667 23.58333
29 2020 5 48 37.16667 13.41667 35.16667 23.75000
30 2020 6 48 39.00000 15.25000 37.16667 23.75000
31 2020 7 50 40.83333 17.08333 39.00000 23.75000
32 2020 8 52 42.66667 19.00000 40.83333 23.66667
33 2020 9 52 44.33333 21.00000 42.66667 23.33333
34 2020 10 54 45.83333 23.16667 44.33333 22.66667
35 2020 11 56 47.50000 25.16667 45.83333 22.33333
36 2020 12 NA NA 27.16667 47.50000 NA
我比较偏执。也就是说,无论我们有多少年,即使有一个小的机会我们错过了一个月,而不是做 lag(..., 12)
是一个坏主意,更糟糕的是,因为你不会收到任何警告或错误,而且你的数据会错的。
因此,我将推荐自助加入。
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month"))
# Monthavg empavg year month date ravg lastravg
# 1 20185 2 2018 5 2018-05-01 NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667
可以验证每个lastempavg
是上一年的值,可以正常mutate
差值,或许
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month")) %>%
mutate(AnnualDifference = ravg - lastravg)
# Monthavg empavg year month date ravg lastravg AnnualDifference
# 1 20185 2 2018 5 2018-05-01 NA NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333 23.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667 23.75000
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000 23.75000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333 23.75000
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000 23.66667
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000 23.33333
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667 22.66667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667 22.33333
关于此的旁注:将年份和月份存储为 integer
可能更好,原因如下:(1) 它使此类事情变得非常容易; (2) 它保留序数,而 arrange(ces12f, month)
会愉快地将月份排序为 1、10、11、12、2 等; (3) (主观)毕竟它们确实是整数。