提取滞后数据,但仅限于 R 中的特定季节

Pulling lagged data but only for a particular season in R

我有一个包含两个变量的特定数据集。一个是数字,另一个是标识数字数据来自的季节和年份的字符。这是数据头部的样子:

   SeasonYear  mean
   <chr>      <dbl>
 1 winter2000 0.957
 2 spring2000 0.943
 3 summer2000 1.03 
 4 fall2000   0.981
 5 winter2001 1.06 
 6 spring2001 1.05 
 7 summer2001 1.02 
 8 fall2001   1.03 
 9 winter2002 1.02 
10 spring2002 1.05 

现在我希望拉动此数据的延迟,但仅限于之前的 spring,以便我的数据看起来像这样:

SeasonYear  mean     lag
   <chr>      <dbl>  <dbl> 
 1 winter2000 0.957   NA
 2 spring2000 0.943   NA
 3 summer2000 1.03    0.943
 4 fall2000   0.981   0.943
 5 winter2001 1.06    0.943
 6 spring2001 1.05    0.943
 7 summer2001 1.02    1.05
 8 fall2001   1.03    1.05
 9 winter2002 1.02    1.05
10 spring2002 1.05    1.05

我也希望返回 2 springs 以便我的数据看起来像这样:

SeasonYear  mean     lag
   <chr>      <dbl>  <dbl> 
 1 winter2000 0.957   NA
 2 spring2000 0.943   NA
 3 summer2000 1.03    NA
 4 fall2000   0.981   NA
 5 winter2001 1.06    NA
 6 spring2001 1.05    NA
 7 summer2001 1.02    0.943
 8 fall2001   1.03    0.943
 9 winter2002 1.02    0.943
10 spring2002 1.05    0.943

我知道我可以使用 lag() 函数来获取数据框中的先前数据,但我正在寻找一种方法来指定一个函数,该函数可以像我提到的那样拉出特定类型的滞后。

实现您想要的结果的一个选项可能如下所示:

  1. 将您的 SeasonYear 拆分为季节和年份
  2. 在每一年
  3. 中添加一个包含 spring 值的列
  4. 考虑到秋季和夏季的第 (n-1) 个滞后,得到第 n 个滞后
library(tidyr)
library(dplyr)

lag_spring <- function(x, y, n = 1) {
  data.frame(x = x, season_year = y) %>%
    tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\d{4})$") %>%
    group_by(year) %>%
    mutate(springmean = x[season == "spring"]) %>%
    ungroup() %>%
    group_by(season) %>%
    mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
    ungroup() %>%
    pull(lag)
}

dd %>%
  mutate(lag = lag_spring(mean, SeasonYear))
#>    SeasonYear  mean   lag
#> 1  winter2000 0.957    NA
#> 2  spring2000 0.943    NA
#> 3  summer2000 1.030 0.943
#> 4    fall2000 0.981 0.943
#> 5  winter2001 1.060 0.943
#> 6  spring2001 1.050 0.943
#> 7  summer2001 1.020 1.050
#> 8    fall2001 1.030 1.050
#> 9  winter2002 1.020 1.050
#> 10 spring2002 1.050 1.050

dd %>%
  mutate(lag = lag_spring(mean, SeasonYear, n = 2))
#>    SeasonYear  mean   lag
#> 1  winter2000 0.957    NA
#> 2  spring2000 0.943    NA
#> 3  summer2000 1.030    NA
#> 4    fall2000 0.981    NA
#> 5  winter2001 1.060    NA
#> 6  spring2001 1.050    NA
#> 7  summer2001 1.020 0.943
#> 8    fall2001 1.030 0.943
#> 9  winter2002 1.020 0.943
#> 10 spring2002 1.050 0.943

数据

dd <- structure(list(SeasonYear = c(
  "winter2000", "spring2000", "summer2000",
  "fall2000", "winter2001", "spring2001", "summer2001", "fall2001",
  "winter2002", "spring2002"
), mean = c(
  0.957, 0.943, 1.03, 0.981,
  1.06, 1.05, 1.02, 1.03, 1.02, 1.05
)), class = "data.frame", row.names = c(
  "1",
  "2", "3", "4", "5", "6", "7", "8", "9", "10"
))