用移动平均线的下一列填充 NA

Fill NAs with next columns for moving average

set.seed(123)
df <- data.frame(loc.id = rep(c(1:3), each = 4*10), 
                       year = rep(rep(c(1980:1983), each = 10), times = 3),
                       day = rep(1:10, times = 3*4),
                       x = sample(123:200, 4*3*10, replace = T))

我想再添加一列 x.mv,这是每个 loc.id 和年份组合

x 的 3 天移动平均线
df %>% group_by(loc.id,year) %>% mutate(x.mv = zoo::rollmean(x, 3, fill = "NA", align = "right"))

          loc.id  year   day     x  x.mv
          <int> <int> <int> <int> <dbl>
      1      1   1980     1   145  NA 
      2      1   1980     2   184  NA 
      3      1   1980     3   154  161 
      4      1   1980     4   191  176.
      5      1   1980     5   196  180.
      6      1   1980     6   126  171 
      7      1   1980     7   164  162 
      8      1   1980     8   192  161.
      9      1   1980     9   166  174 
      10      1  1980    10   158  172 

我想做的是用x替换x.mv列中的NA。我试过这个:

df %>% group_by(loc.id,year) %>% mutate(x.mv = zoo::rollmean(x, 3, fill = x[1:2], align = "right"))

            loc.id  year   day     x  x.mv
            <int> <int> <int> <int> <dbl>
        1      1   1980     1   145  145 
        2      1   1980     2   184  145 
        3      1   1980     3   154  161 
        4      1   1980     4   191  176.
        5      1   1980     5   196  180.
        6      1   1980     6   126  171 
        7      1   1980     7   164  162 
        8      1   1980     8   192  161.
        9      1   1980     9   166  174 
        10     1  1980     10   158  172 

但是它正在做的是用 x 的第一个值而不是 x 的对应值填充 NA。我该如何解决?

跳过 fill 参数并手动填充:

df %>%
  group_by(loc.id,year) %>%
  mutate(x.mv = c(x[1:2],zoo::rollmean(x, 3, align = "right"))) %>%
  ungroup

# # A tibble: 120 x 5
#   loc.id  year   day     x     x.mv
#    <int> <int> <int> <int>    <dbl>
# 1      1  1980     1   145 145.0000
# 2      1  1980     2   184 184.0000
# 3      1  1980     3   154 161.0000
# 4      1  1980     4   191 176.3333
# 5      1  1980     5   196 180.3333
# 6      1  1980     6   126 171.0000
# 7      1  1980     7   164 162.0000
# 8      1  1980     8   192 160.6667
# 9      1  1980     9   166 174.0000
# 10     1  1980    10   158 172.0000
# # ... with 110 more rows

您可能想使用 dplyr::cummean(x[1:2]) 而不是 x[1:2],以便已经获得第二个值的平均值,或者在这种情况下,在评论中使用@g-grothendieck 的建议并重写您的将调用更改为 mutate(x.mv = rollapplyr(x, 3, mean, partial = TRUE)).