从特定季节提取滞后数据,但仅适用于 R 中变量指示的特定数据集
Pulling lagged data from a particular season but only for specific data sets as indicated by variable in R
我原来的查询来自这个问题:
这回答了我关于特定数据框的问题;但是,现在我有一个大的聚合数据框,需要添加一行代码来说明每个单独的数据集 (Lake_name)。
这是我的数据:
SeasonYear change Lake_name
1 winter2020 0.007877245 AlanHenry
2 spring2020 0.058515310 AlanHenry
3 summer2020 0.013850687 AlanHenry
4 fall2020 -0.071774781 AlanHenry
5 winter2021 -0.040268206 AlanHenry
6 spring2021 -0.020803715 AlanHenry
7 summer2021 0.181610974 AlanHenry
8 winter2020 -0.029708916 Amistad
9 spring2020 -0.063310371 Amistad
10 summer2020 -0.054231575 Amistad
11 fall2020 0.016057252 Amistad
12 winter2021 0.011785717 Amistad
13 spring2021 -0.030677687 Amistad
14 summer2021 -0.015691720 Amistad
15 winter2020 -0.011974634 AmonGCarter
16 spring2020 0.168774234 AmonGCarter
17 summer2020 -0.041486735 AmonGCarter
18 fall2020 -0.095134974 AmonGCarter
19 winter2021 -0.030310177 AmonGCarter
20 spring2021 0.033528325 AmonGCarter
我正在尝试构建一个函数,它可以消除之前 spring 的滞后(参见之前的 post),但也可以解释每个湖泊。如果我将它单独分开,我可以做到这一点,但我有一个非常大的数据集,这需要很长时间才能完成。这是我尝试使用的代码(从我引用的 post 修改而来):
library(dplyr)
lag_spring <- function(x, y, n = 1) {
data.frame(x = x, season_year = y) %>%
group_by(Lake_name) %>%
tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\d{4})$") %>%
group_by(year) %>%
mutate(springmean = x[season == "spring"]) %>%
ungroup() %>%
group_by(season) %>%
mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
ungroup() %>%
pull(lag)
}
我试图在每个湖中添加 group_by(Lake_name)
来执行此操作,但是当我 运行 代码时:
data %>% mutate(springlag = lag_spring(change, SeasonYear,n=1),
springlag2 = lag_spring(change, SeasonYear,n=2),
springlag3 = lag_spring(change, SeasonYear,n=3))
我收到这个错误:
Error: Problem with mutate() input springlag.
x Must group by variables found in .data.
Column Lake_name is not found.
i Input springlag is lag_spring(change, SeasonYear, n = 1)
有人可以帮助修改我之前获得的代码以获得“spring滞后”,但在 dplyr 中包含一行仅在每个单独的湖中执行此操作吗?
不需要更改功能。您可以在计算滞后的 mutate
之前使用 group_by
来实现您想要的结果:
library(tidyr)
library(dplyr)
lag_spring <- function(x, y, n = 1) {
data.frame(x = x, season_year = y) %>%
tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\d{4})$") %>%
group_by(year) %>%
mutate(springmean = if (any(season == "spring")) x[season == "spring"] else NA) %>%
ungroup() %>%
group_by(season) %>%
mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
ungroup() %>%
pull(lag)
}
dd %>%
group_by(Lake_name) %>%
mutate(lag = lag_spring(change, SeasonYear))
#> # A tibble: 20 × 4
#> # Groups: Lake_name [3]
#> SeasonYear change Lake_name lag
#> <chr> <dbl> <chr> <dbl>
#> 1 winter2020 0.00788 AlanHenry NA
#> 2 spring2020 0.0585 AlanHenry NA
#> 3 summer2020 0.0139 AlanHenry 0.0585
#> 4 fall2020 -0.0718 AlanHenry 0.0585
#> 5 winter2021 -0.0403 AlanHenry 0.0585
#> 6 spring2021 -0.0208 AlanHenry 0.0585
#> 7 summer2021 0.182 AlanHenry -0.0208
#> 8 winter2020 -0.0297 Amistad NA
#> 9 spring2020 -0.0633 Amistad NA
#> 10 summer2020 -0.0542 Amistad -0.0633
#> 11 fall2020 0.0161 Amistad -0.0633
#> 12 winter2021 0.0118 Amistad -0.0633
#> 13 spring2021 -0.0307 Amistad -0.0633
#> 14 summer2021 -0.0157 Amistad -0.0307
#> 15 winter2020 -0.0120 AmonGCarter NA
#> 16 spring2020 0.169 AmonGCarter NA
#> 17 summer2020 -0.0415 AmonGCarter 0.169
#> 18 fall2020 -0.0951 AmonGCarter 0.169
#> 19 winter2021 -0.0303 AmonGCarter 0.169
#> 20 spring2021 0.0335 AmonGCarter 0.169
数据
dd <- structure(list(SeasonYear = c(
"winter2020", "spring2020", "summer2020",
"fall2020", "winter2021", "spring2021", "summer2021", "winter2020",
"spring2020", "summer2020", "fall2020", "winter2021", "spring2021",
"summer2021", "winter2020", "spring2020", "summer2020", "fall2020",
"winter2021", "spring2021"
), change = c(
0.007877245, 0.05851531,
0.013850687, -0.071774781, -0.040268206, -0.020803715, 0.181610974,
-0.029708916, -0.063310371, -0.054231575, 0.016057252, 0.011785717,
-0.030677687, -0.01569172, -0.011974634, 0.168774234, -0.041486735,
-0.095134974, -0.030310177, 0.033528325
), Lake_name = c(
"AlanHenry",
"AlanHenry", "AlanHenry", "AlanHenry", "AlanHenry", "AlanHenry",
"AlanHenry", "Amistad", "Amistad", "Amistad", "Amistad", "Amistad",
"Amistad", "Amistad", "AmonGCarter", "AmonGCarter", "AmonGCarter",
"AmonGCarter", "AmonGCarter", "AmonGCarter"
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"
))
我原来的查询来自这个问题:
这回答了我关于特定数据框的问题;但是,现在我有一个大的聚合数据框,需要添加一行代码来说明每个单独的数据集 (Lake_name)。
这是我的数据:
SeasonYear change Lake_name
1 winter2020 0.007877245 AlanHenry
2 spring2020 0.058515310 AlanHenry
3 summer2020 0.013850687 AlanHenry
4 fall2020 -0.071774781 AlanHenry
5 winter2021 -0.040268206 AlanHenry
6 spring2021 -0.020803715 AlanHenry
7 summer2021 0.181610974 AlanHenry
8 winter2020 -0.029708916 Amistad
9 spring2020 -0.063310371 Amistad
10 summer2020 -0.054231575 Amistad
11 fall2020 0.016057252 Amistad
12 winter2021 0.011785717 Amistad
13 spring2021 -0.030677687 Amistad
14 summer2021 -0.015691720 Amistad
15 winter2020 -0.011974634 AmonGCarter
16 spring2020 0.168774234 AmonGCarter
17 summer2020 -0.041486735 AmonGCarter
18 fall2020 -0.095134974 AmonGCarter
19 winter2021 -0.030310177 AmonGCarter
20 spring2021 0.033528325 AmonGCarter
我正在尝试构建一个函数,它可以消除之前 spring 的滞后(参见之前的 post),但也可以解释每个湖泊。如果我将它单独分开,我可以做到这一点,但我有一个非常大的数据集,这需要很长时间才能完成。这是我尝试使用的代码(从我引用的 post 修改而来):
library(dplyr)
lag_spring <- function(x, y, n = 1) {
data.frame(x = x, season_year = y) %>%
group_by(Lake_name) %>%
tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\d{4})$") %>%
group_by(year) %>%
mutate(springmean = x[season == "spring"]) %>%
ungroup() %>%
group_by(season) %>%
mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
ungroup() %>%
pull(lag)
}
我试图在每个湖中添加 group_by(Lake_name)
来执行此操作,但是当我 运行 代码时:
data %>% mutate(springlag = lag_spring(change, SeasonYear,n=1),
springlag2 = lag_spring(change, SeasonYear,n=2),
springlag3 = lag_spring(change, SeasonYear,n=3))
我收到这个错误:
Error: Problem with mutate() input springlag. x Must group by variables found in .data. Column Lake_name is not found. i Input springlag is lag_spring(change, SeasonYear, n = 1)
有人可以帮助修改我之前获得的代码以获得“spring滞后”,但在 dplyr 中包含一行仅在每个单独的湖中执行此操作吗?
不需要更改功能。您可以在计算滞后的 mutate
之前使用 group_by
来实现您想要的结果:
library(tidyr)
library(dplyr)
lag_spring <- function(x, y, n = 1) {
data.frame(x = x, season_year = y) %>%
tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\d{4})$") %>%
group_by(year) %>%
mutate(springmean = if (any(season == "spring")) x[season == "spring"] else NA) %>%
ungroup() %>%
group_by(season) %>%
mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
ungroup() %>%
pull(lag)
}
dd %>%
group_by(Lake_name) %>%
mutate(lag = lag_spring(change, SeasonYear))
#> # A tibble: 20 × 4
#> # Groups: Lake_name [3]
#> SeasonYear change Lake_name lag
#> <chr> <dbl> <chr> <dbl>
#> 1 winter2020 0.00788 AlanHenry NA
#> 2 spring2020 0.0585 AlanHenry NA
#> 3 summer2020 0.0139 AlanHenry 0.0585
#> 4 fall2020 -0.0718 AlanHenry 0.0585
#> 5 winter2021 -0.0403 AlanHenry 0.0585
#> 6 spring2021 -0.0208 AlanHenry 0.0585
#> 7 summer2021 0.182 AlanHenry -0.0208
#> 8 winter2020 -0.0297 Amistad NA
#> 9 spring2020 -0.0633 Amistad NA
#> 10 summer2020 -0.0542 Amistad -0.0633
#> 11 fall2020 0.0161 Amistad -0.0633
#> 12 winter2021 0.0118 Amistad -0.0633
#> 13 spring2021 -0.0307 Amistad -0.0633
#> 14 summer2021 -0.0157 Amistad -0.0307
#> 15 winter2020 -0.0120 AmonGCarter NA
#> 16 spring2020 0.169 AmonGCarter NA
#> 17 summer2020 -0.0415 AmonGCarter 0.169
#> 18 fall2020 -0.0951 AmonGCarter 0.169
#> 19 winter2021 -0.0303 AmonGCarter 0.169
#> 20 spring2021 0.0335 AmonGCarter 0.169
数据
dd <- structure(list(SeasonYear = c(
"winter2020", "spring2020", "summer2020",
"fall2020", "winter2021", "spring2021", "summer2021", "winter2020",
"spring2020", "summer2020", "fall2020", "winter2021", "spring2021",
"summer2021", "winter2020", "spring2020", "summer2020", "fall2020",
"winter2021", "spring2021"
), change = c(
0.007877245, 0.05851531,
0.013850687, -0.071774781, -0.040268206, -0.020803715, 0.181610974,
-0.029708916, -0.063310371, -0.054231575, 0.016057252, 0.011785717,
-0.030677687, -0.01569172, -0.011974634, 0.168774234, -0.041486735,
-0.095134974, -0.030310177, 0.033528325
), Lake_name = c(
"AlanHenry",
"AlanHenry", "AlanHenry", "AlanHenry", "AlanHenry", "AlanHenry",
"AlanHenry", "Amistad", "Amistad", "Amistad", "Amistad", "Amistad",
"Amistad", "Amistad", "AmonGCarter", "AmonGCarter", "AmonGCarter",
"AmonGCarter", "AmonGCarter", "AmonGCarter"
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"
))