如何在 R 中生成年复一年的计算列

How to produce a year over year calculated column in R

首先是数据,然后是操作。最后,我目前使用的方法是目前还没有产生任何数据。操作是创建一个日期,然后创建一个滚动的 12 个月平均值。

   Monthavg<- 
   c(20185,20186,20187,20188,20189,201810,201811,201812,20191,20192,20193,20194,20195,20196,
      20197,20198,20199,201910,201911,201912,20201
      ,20202,20203,20204,20205,20206,20207
      ,20208,20209,202010,202011)

  empavg<-c(2,4,6,7,8,10,12,14,16,18,20,22,24,26,28,30,32,36,36,38,40,42,44,46,48,48,50,52,52,54,56)

  ces12f <- data.frame(Monthavg,empavg)

操纵

 ces12f<- ces12f %>% mutate(year = substr(as.character(Monthavg),1,4),
              month = substr(as.character(Monthavg),5,7),
              date = as.Date(paste(year,month,"1",sep ="-")))
 Month_ord <- order(Monthavg)
 span_month=12
 ces12f<-ces12f %>% mutate(ravg = zoo::rollmeanr(empavg, 12, fill = NA))

年差尝试

 ces12f<- ces12f%>%
 group_by(Monthavg)%>%
 mutate(PreviousYear=lag(ravg,12), 
     PreviousMonth=lag(ravg),
     AnnualDifference=ravg-PreviousYear)%>%
 ungroup()

最终目标是 202011 减去 201911 或 47.5 减去 25.17 或 22.3。我在上面使用的方法只产生 NA。任何关于如何修改现有代码或简单地使用完全不同的方法的见解都将不胜感激。

这是 tidyr::extract 的一种方法。您可以使用 tidyr::complete 来确保填写任何缺失的月份:

library(tidyverse)
library(zoo)
ces12f %>%
  mutate(Monthavg = as.character(Monthavg)) %>%
  extract(Monthavg, into = c("Year", "Month"),
          regex = "^([0-9]{4})([0-9]{1,2})$") %>%
  mutate(across(Year:Month, as.integer)) %>%
  arrange(Year,Month) %>%
  complete(Year, Month) %>%
  mutate(ravg = zoo::rollmeanr(empavg,12,NA)) %>%
  mutate(PreviousYear=lag(ravg,12), 
         PreviousMonth=lag(ravg),
         AnnualDifference=ravg-PreviousYear)
   Year Month empavg     ravg PreviousYear PreviousMonth AnnualDifference
1  2018     1     NA       NA           NA            NA               NA
2  2018     2     NA       NA           NA            NA               NA
3  2018     3     NA       NA           NA            NA               NA
4  2018     4     NA       NA           NA            NA               NA
5  2018     5      2       NA           NA            NA               NA
6  2018     6      4       NA           NA            NA               NA
7  2018     7      6       NA           NA            NA               NA
8  2018     8      7       NA           NA            NA               NA
9  2018     9      8       NA           NA            NA               NA
10 2018    10     10       NA           NA            NA               NA
11 2018    11     12       NA           NA            NA               NA
12 2018    12     14       NA           NA            NA               NA
13 2019     1     16       NA           NA            NA               NA
14 2019     2     18       NA           NA            NA               NA
15 2019     3     20       NA           NA            NA               NA
16 2019     4     22 11.58333           NA            NA               NA
17 2019     5     24 13.41667           NA      11.58333               NA
18 2019     6     26 15.25000           NA      13.41667               NA
19 2019     7     28 17.08333           NA      15.25000               NA
20 2019     8     30 19.00000           NA      17.08333               NA
21 2019     9     32 21.00000           NA      19.00000               NA
22 2019    10     36 23.16667           NA      21.00000               NA
23 2019    11     36 25.16667           NA      23.16667               NA
24 2019    12     38 27.16667           NA      25.16667               NA
25 2020     1     40 29.16667           NA      27.16667               NA
26 2020     2     42 31.16667           NA      29.16667               NA
27 2020     3     44 33.16667           NA      31.16667               NA
28 2020     4     46 35.16667     11.58333      33.16667         23.58333
29 2020     5     48 37.16667     13.41667      35.16667         23.75000
30 2020     6     48 39.00000     15.25000      37.16667         23.75000
31 2020     7     50 40.83333     17.08333      39.00000         23.75000
32 2020     8     52 42.66667     19.00000      40.83333         23.66667
33 2020     9     52 44.33333     21.00000      42.66667         23.33333
34 2020    10     54 45.83333     23.16667      44.33333         22.66667
35 2020    11     56 47.50000     25.16667      45.83333         22.33333
36 2020    12     NA       NA     27.16667      47.50000               NA

我比较偏执。也就是说,无论我们有多少年,即使有一个小的机会我们错过了一个月,而不是做 lag(..., 12) 是一个坏主意,更糟糕的是,因为你不会收到任何警告或错误,而且你的数据会错的。

因此,我将推荐自助加入。

transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
  left_join(ces12f, ., by = c("year", "month"))
#    Monthavg empavg year month       date     ravg lastravg
# 1     20185      2 2018     5 2018-05-01       NA       NA
# 2     20186      4 2018     6 2018-06-01       NA       NA
# 3     20187      6 2018     7 2018-07-01       NA       NA
# 4     20188      7 2018     8 2018-08-01       NA       NA
# 5     20189      8 2018     9 2018-09-01       NA       NA
# 6    201810     10 2018    10 2018-10-01       NA       NA
# 7    201811     12 2018    11 2018-11-01       NA       NA
# 8    201812     14 2018    12 2018-12-01       NA       NA
# 9     20191     16 2019     1 2019-01-01       NA       NA
# 10    20192     18 2019     2 2019-02-01       NA       NA
# 11    20193     20 2019     3 2019-03-01       NA       NA
# 12    20194     22 2019     4 2019-04-01 11.58333       NA
# 13    20195     24 2019     5 2019-05-01 13.41667       NA
# 14    20196     26 2019     6 2019-06-01 15.25000       NA
# 15    20197     28 2019     7 2019-07-01 17.08333       NA
# 16    20198     30 2019     8 2019-08-01 19.00000       NA
# 17    20199     32 2019     9 2019-09-01 21.00000       NA
# 18   201910     36 2019    10 2019-10-01 23.16667       NA
# 19   201911     36 2019    11 2019-11-01 25.16667       NA
# 20   201912     38 2019    12 2019-12-01 27.16667       NA
# 21    20201     40 2020     1 2020-01-01 29.16667       NA
# 22    20202     42 2020     2 2020-02-01 31.16667       NA
# 23    20203     44 2020     3 2020-03-01 33.16667       NA
# 24    20204     46 2020     4 2020-04-01 35.16667 11.58333
# 25    20205     48 2020     5 2020-05-01 37.16667 13.41667
# 26    20206     48 2020     6 2020-06-01 39.00000 15.25000
# 27    20207     50 2020     7 2020-07-01 40.83333 17.08333
# 28    20208     52 2020     8 2020-08-01 42.66667 19.00000
# 29    20209     52 2020     9 2020-09-01 44.33333 21.00000
# 30   202010     54 2020    10 2020-10-01 45.83333 23.16667
# 31   202011     56 2020    11 2020-11-01 47.50000 25.16667

可以验证每个lastempavg是上一年的值,可以正常mutate差值,或许

transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
  left_join(ces12f, ., by = c("year", "month")) %>%
  mutate(AnnualDifference = ravg - lastravg)
#    Monthavg empavg year month       date     ravg lastravg AnnualDifference
# 1     20185      2 2018     5 2018-05-01       NA       NA               NA
# 2     20186      4 2018     6 2018-06-01       NA       NA               NA
# 3     20187      6 2018     7 2018-07-01       NA       NA               NA
# 4     20188      7 2018     8 2018-08-01       NA       NA               NA
# 5     20189      8 2018     9 2018-09-01       NA       NA               NA
# 6    201810     10 2018    10 2018-10-01       NA       NA               NA
# 7    201811     12 2018    11 2018-11-01       NA       NA               NA
# 8    201812     14 2018    12 2018-12-01       NA       NA               NA
# 9     20191     16 2019     1 2019-01-01       NA       NA               NA
# 10    20192     18 2019     2 2019-02-01       NA       NA               NA
# 11    20193     20 2019     3 2019-03-01       NA       NA               NA
# 12    20194     22 2019     4 2019-04-01 11.58333       NA               NA
# 13    20195     24 2019     5 2019-05-01 13.41667       NA               NA
# 14    20196     26 2019     6 2019-06-01 15.25000       NA               NA
# 15    20197     28 2019     7 2019-07-01 17.08333       NA               NA
# 16    20198     30 2019     8 2019-08-01 19.00000       NA               NA
# 17    20199     32 2019     9 2019-09-01 21.00000       NA               NA
# 18   201910     36 2019    10 2019-10-01 23.16667       NA               NA
# 19   201911     36 2019    11 2019-11-01 25.16667       NA               NA
# 20   201912     38 2019    12 2019-12-01 27.16667       NA               NA
# 21    20201     40 2020     1 2020-01-01 29.16667       NA               NA
# 22    20202     42 2020     2 2020-02-01 31.16667       NA               NA
# 23    20203     44 2020     3 2020-03-01 33.16667       NA               NA
# 24    20204     46 2020     4 2020-04-01 35.16667 11.58333         23.58333
# 25    20205     48 2020     5 2020-05-01 37.16667 13.41667         23.75000
# 26    20206     48 2020     6 2020-06-01 39.00000 15.25000         23.75000
# 27    20207     50 2020     7 2020-07-01 40.83333 17.08333         23.75000
# 28    20208     52 2020     8 2020-08-01 42.66667 19.00000         23.66667
# 29    20209     52 2020     9 2020-09-01 44.33333 21.00000         23.33333
# 30   202010     54 2020    10 2020-10-01 45.83333 23.16667         22.66667
# 31   202011     56 2020    11 2020-11-01 47.50000 25.16667         22.33333

关于此的旁注:将年份和月份存储为 integer 可能更好,原因如下:(1) 它使此类事情变得非常容易; (2) 它保留序数,而 arrange(ces12f, month) 会愉快地将月份排序为 1、10、11、12、2 等; (3) (主观)毕竟它们确实是整数。