从不同年份开始并具有不同 window 长度的时间序列的移动平均线
Moving average along a time series starting at different years and with different window length
我有一个包含两个变量的数据框。
第一个变量是从 1 到 2016 连续跨越的年份,第二个变量是我的兴趣值。例如
ts <- data.frame(Year=c(1:2016), TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
我需要计算从上次观察(2016-50=1966)前 50 年开始的 21 个移动平均值,每个都包含 window,范围从 10、11、12、...30 年,第一个值与 2016 年对齐。
我需要的是一个包含 23 列(“年”、“TS”、“av10”、“av11”、“av12”、...“av30”)和从 1 年到 2016 年的 2016 行的数据框。
因此,我这样做了:
subset1966 <- ts[1:1966,]
subset1966$Year.50 <- c(51:2016)
ts.50yrs <- subset1966%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.50yrs.df <- ts.50yrs[,-c(1:2)]
colnames(ts.50yrs.df)[1] <- "Year"
df.ts.50 <- full_join(ts,ts.50yrs.df)
我在最后一次观察之前的 100、150、200 和 250 年开始的不同年份重复了相同的程序。
例如
subset1766 <- data[1:1766,]
subset1766$Year.250 <- c(251:2016)
ts.250yrs <- subset1766%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.250yrs.df <- ts.250yrs[,-c(1:2)]
colnames(ts.250yrs.df)[1] <- "Year"
df.ts.250 <- full_join(ts,ts.50yrs.df)
然后我将数据框合并在一起。例如:
df <- cbind(df.ts.50, df.ts.100, df.ts.150, df.ts.200, df.ts.250)
但是,我接下来想做的是重复相同的移动平均线,起始年份从 50、51、52、...250 开始,这意味着从 1966 年 (2016-50) 开始,然后继续直到 1766 年 (2016-250),每个起始年的移动平均值范围为 10、11、12、...30 年。此外,所有值都需要与 2016 年保持一致。
所以,这将是一个包含 4221 列的数据框(201(= 50 到 250 年)* 21(= windows 从 10 到 30 年)),加上“年”的两列”和“TS”。列名称类似于“Year”、“TS”、“av50.10yr”、“av50.11yr”、“av50.12yr”、...、“av50.30yr”、“av51.10yr”、“ av51.11yr, "av51.12yr", ... "av51.30yr", ... "av250.10yr", "av250.11yr", "av250.12yr", ..., "av250.30yr" ) 和从第 1 年到 2016 年的 2016 行。
您可以将逻辑放入一个函数 makeAvg()
中,它仅需要年数并从 max(dat$year)
中减去,即 50
给出所需的 1:1966
,
# makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
# sbs <- 1:(max(data$Year) - s)
# X <- data[sbs, ]
# cbind(X, sapply(ks, \(k) zoo::rollmean(X[[x]], k=k, fill=NA, align="right")) |>
# `colnames<-`(paste0('av', ks, '_', s)))
# }
(编辑:为避免速度变慢 zoo::rollmean
,试试下面这个版本,它使用 RcppRoll::roll_mean
大约是 快20倍.)
makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
sbs <- 1:(max(data$Year) - s)
X <- data[sbs, ]
cbind(X,
sapply(ks, \(k) RcppRoll::roll_mean(X[[x]], n=k, fill=NA, align='r')) |>
`colnames<-`(paste0('av', ks, '_', s)))
}
注:R版本>=4.1
然后抛出这个:
makeAvg(50)[8:13, ]
# Year TS av10_50 av11_50 av12_50 av13_50 av14_50 av15_50 av16_50
# 8 8 -0.23 NA NA NA NA NA NA NA
# 9 9 -1.27 NA NA NA NA NA NA NA
# 10 10 -0.62 -0.448 NA NA NA NA NA NA
# 11 11 0.14 -0.332 -0.3945455 NA NA NA NA NA
# 12 12 -0.41 -0.375 -0.3390909 -0.3958333 NA NA NA NA
# 13 13 -1.31 -0.429 -0.4600000 -0.4200000 -0.4661538 NA NA NA
# av17_50 av18_50 av19_50 av20_50 av21_50 av22_50 av23_50 av24_50 av25_50 av26_50
# 8 NA NA NA NA NA NA NA NA NA NA
# 9 NA NA NA NA NA NA NA NA NA NA
# 10 NA NA NA NA NA NA NA NA NA NA
# 11 NA NA NA NA NA NA NA NA NA NA
# 12 NA NA NA NA NA NA NA NA NA NA
# 13 NA NA NA NA NA NA NA NA NA NA
# av27_50 av28_50 av29_50 av30_50
# 8 NA NA NA NA
# 9 NA NA NA NA
# 10 NA NA NA NA
# 11 NA NA NA NA
# 12 NA NA NA NA
# 13 NA NA NA NA
然后把starts
放到一个向量中,
starts <- seq(50, 250, 1)
你使用 lapply
在其上循环,然后用 merge()
:
将其放入 Reduce()
res <- Reduce(\(...) merge(..., all=TRUE), lapply(starts, makeAvg)) ## runs ~25 s
这给出了一个完全符合预期尺寸的数据框,
dim(res)
# [1] 1966 4223
还有这些名字:
names(res)
# [1] "Year" "TS" "av10_50" "av11_50" "av12_50" "av13_50" "av14_50"
# [8] "av15_50" "av16_50" "av17_50" "av18_50" "av19_50" "av20_50" "av21_50"
# [...]
数据:
set.seed(42)
dat <- data.frame(Year=c(1:2016),
TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
在这里,我创建了一个自定义函数 moving_mean()
,它获取数据框和起始年份。然后,我使用 mutate
创建一个新列,其起始年份序列(例如 50)到 2016 年。然后,我创建了一个名称列表,这些名称将用于数据框中的列名称(例如,av50.10yr
).然后,我使用 map2
到 mutate
滚动年数的新列(例如,10、11、...、30)。然后,我使用 purrr::reduce
将给定起始年份(例如 50)的所有数据帧合并为一个。接下来,我在原始数据帧 ts
上从 50 到 250 的每个起始年份使用 map
到 运行 函数 moving_mean
。然后,我将 201 个数据帧放入同一个数据帧中(再次使用 reduce
)。最后,我将这个数据框加入了原始数据框,它向 ts
.
添加了 4,221 个新列
library(tidyverse)
moving_mean <- function(data, y){
new_year <- paste0("Year.", as.character(y))
x_new <- data %>%
filter(Year <= (tail(Year, n=1) - y)) %>%
rowwise %>%
dplyr::mutate({{new_year}} := (Year + y)) %>%
ungroup()
varnames <- unlist(map(10:30, function(x) paste0("av", y, ".", x, "yr")))
map2(10:30, varnames, function(x, y) x_new %>%
dplyr::mutate({{y}} := rollmean(TS, k = x, fill = NA, align = "right")) %>%
as.data.frame()) %>%
reduce(left_join, by = c(names(x_new)[1:3])) %>%
select(-c("Year", "TS")) %>%
dplyr::rename(Year = 1)
}
results <- map(c(seq(50, 250, 1)), ~ moving_mean(ts, .x)) %>%
reduce(left_join, by = "Year") %>%
left_join(ts, ., by = "Year")
输出(这里只打印一小部分)
dim(results)
# [1] 2016 4223
results[60:70, 1:15]
Year TS av50.10yr av50.11yr av50.12yr av50.13yr av50.14yr av50.15yr av50.16yr av50.17yr av50.18yr av50.19yr av50.20yr av50.21yr av50.22yr
60 60 0.63 -0.080 NA NA NA NA NA NA NA NA NA NA NA NA
61 61 -1.50 0.123 -0.01727273 NA NA NA NA NA NA NA NA NA NA NA
62 62 -0.07 0.295 0.15545455 0.02416667 NA NA NA NA NA NA NA NA NA NA
63 63 -0.84 0.338 0.30000000 0.17166667 0.04923077 NA NA NA NA NA NA NA NA NA
64 64 -0.36 0.317 0.39363636 0.35416667 0.23153846 0.1135714 NA NA NA NA NA NA NA NA
65 65 0.34 0.442 0.29000000 0.36250000 0.32846154 0.2164286 0.1073333 NA NA NA NA NA NA NA
66 66 -0.45 0.312 0.35363636 0.22166667 0.29384615 0.2671429 0.1666667 0.067500 NA NA NA NA NA NA
67 67 -1.17 0.415 0.25000000 0.29333333 0.17615385 0.2464286 0.2246667 0.133125 0.04176471 NA NA NA NA NA
68 68 -0.77 0.355 0.42363636 0.27166667 0.31000000 0.2000000 0.2640000 0.242500 0.15529412 0.06777778 NA NA NA NA
69 69 0.51 0.477 0.45090909 0.50583333 0.35923077 0.3885714 0.2806667 0.335625 0.31117647 0.22500000 0.1384211 NA NA NA
70 70 -0.25 0.331 0.42272727 0.40333333 0.45769231 0.3250000 0.3546667 0.255625 0.30882353 0.28722222 0.2068421 0.1255 NA NA
数据
set.seed(28)
ts <- data.frame(Year = c(1:2016),
TS = sample(
seq(from = -1.5, to = 1.5, by = 0.01),
size = 2016,
replace = TRUE
))
我有一个包含两个变量的数据框。 第一个变量是从 1 到 2016 连续跨越的年份,第二个变量是我的兴趣值。例如
ts <- data.frame(Year=c(1:2016), TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
我需要计算从上次观察(2016-50=1966)前 50 年开始的 21 个移动平均值,每个都包含 window,范围从 10、11、12、...30 年,第一个值与 2016 年对齐。 我需要的是一个包含 23 列(“年”、“TS”、“av10”、“av11”、“av12”、...“av30”)和从 1 年到 2016 年的 2016 行的数据框。 因此,我这样做了:
subset1966 <- ts[1:1966,]
subset1966$Year.50 <- c(51:2016)
ts.50yrs <- subset1966%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.50yrs.df <- ts.50yrs[,-c(1:2)]
colnames(ts.50yrs.df)[1] <- "Year"
df.ts.50 <- full_join(ts,ts.50yrs.df)
我在最后一次观察之前的 100、150、200 和 250 年开始的不同年份重复了相同的程序。 例如
subset1766 <- data[1:1766,]
subset1766$Year.250 <- c(251:2016)
ts.250yrs <- subset1766%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.250yrs.df <- ts.250yrs[,-c(1:2)]
colnames(ts.250yrs.df)[1] <- "Year"
df.ts.250 <- full_join(ts,ts.50yrs.df)
然后我将数据框合并在一起。例如:
df <- cbind(df.ts.50, df.ts.100, df.ts.150, df.ts.200, df.ts.250)
但是,我接下来想做的是重复相同的移动平均线,起始年份从 50、51、52、...250 开始,这意味着从 1966 年 (2016-50) 开始,然后继续直到 1766 年 (2016-250),每个起始年的移动平均值范围为 10、11、12、...30 年。此外,所有值都需要与 2016 年保持一致。
所以,这将是一个包含 4221 列的数据框(201(= 50 到 250 年)* 21(= windows 从 10 到 30 年)),加上“年”的两列”和“TS”。列名称类似于“Year”、“TS”、“av50.10yr”、“av50.11yr”、“av50.12yr”、...、“av50.30yr”、“av51.10yr”、“ av51.11yr, "av51.12yr", ... "av51.30yr", ... "av250.10yr", "av250.11yr", "av250.12yr", ..., "av250.30yr" ) 和从第 1 年到 2016 年的 2016 行。
您可以将逻辑放入一个函数 makeAvg()
中,它仅需要年数并从 max(dat$year)
中减去,即 50
给出所需的 1:1966
,
# makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
# sbs <- 1:(max(data$Year) - s)
# X <- data[sbs, ]
# cbind(X, sapply(ks, \(k) zoo::rollmean(X[[x]], k=k, fill=NA, align="right")) |>
# `colnames<-`(paste0('av', ks, '_', s)))
# }
(编辑:为避免速度变慢 zoo::rollmean
,试试下面这个版本,它使用 RcppRoll::roll_mean
大约是 快20倍.)
makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
sbs <- 1:(max(data$Year) - s)
X <- data[sbs, ]
cbind(X,
sapply(ks, \(k) RcppRoll::roll_mean(X[[x]], n=k, fill=NA, align='r')) |>
`colnames<-`(paste0('av', ks, '_', s)))
}
注:R版本>=4.1
然后抛出这个:
makeAvg(50)[8:13, ]
# Year TS av10_50 av11_50 av12_50 av13_50 av14_50 av15_50 av16_50
# 8 8 -0.23 NA NA NA NA NA NA NA
# 9 9 -1.27 NA NA NA NA NA NA NA
# 10 10 -0.62 -0.448 NA NA NA NA NA NA
# 11 11 0.14 -0.332 -0.3945455 NA NA NA NA NA
# 12 12 -0.41 -0.375 -0.3390909 -0.3958333 NA NA NA NA
# 13 13 -1.31 -0.429 -0.4600000 -0.4200000 -0.4661538 NA NA NA
# av17_50 av18_50 av19_50 av20_50 av21_50 av22_50 av23_50 av24_50 av25_50 av26_50
# 8 NA NA NA NA NA NA NA NA NA NA
# 9 NA NA NA NA NA NA NA NA NA NA
# 10 NA NA NA NA NA NA NA NA NA NA
# 11 NA NA NA NA NA NA NA NA NA NA
# 12 NA NA NA NA NA NA NA NA NA NA
# 13 NA NA NA NA NA NA NA NA NA NA
# av27_50 av28_50 av29_50 av30_50
# 8 NA NA NA NA
# 9 NA NA NA NA
# 10 NA NA NA NA
# 11 NA NA NA NA
# 12 NA NA NA NA
# 13 NA NA NA NA
然后把starts
放到一个向量中,
starts <- seq(50, 250, 1)
你使用 lapply
在其上循环,然后用 merge()
:
Reduce()
res <- Reduce(\(...) merge(..., all=TRUE), lapply(starts, makeAvg)) ## runs ~25 s
这给出了一个完全符合预期尺寸的数据框,
dim(res)
# [1] 1966 4223
还有这些名字:
names(res)
# [1] "Year" "TS" "av10_50" "av11_50" "av12_50" "av13_50" "av14_50"
# [8] "av15_50" "av16_50" "av17_50" "av18_50" "av19_50" "av20_50" "av21_50"
# [...]
数据:
set.seed(42)
dat <- data.frame(Year=c(1:2016),
TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
在这里,我创建了一个自定义函数 moving_mean()
,它获取数据框和起始年份。然后,我使用 mutate
创建一个新列,其起始年份序列(例如 50)到 2016 年。然后,我创建了一个名称列表,这些名称将用于数据框中的列名称(例如,av50.10yr
).然后,我使用 map2
到 mutate
滚动年数的新列(例如,10、11、...、30)。然后,我使用 purrr::reduce
将给定起始年份(例如 50)的所有数据帧合并为一个。接下来,我在原始数据帧 ts
上从 50 到 250 的每个起始年份使用 map
到 运行 函数 moving_mean
。然后,我将 201 个数据帧放入同一个数据帧中(再次使用 reduce
)。最后,我将这个数据框加入了原始数据框,它向 ts
.
library(tidyverse)
moving_mean <- function(data, y){
new_year <- paste0("Year.", as.character(y))
x_new <- data %>%
filter(Year <= (tail(Year, n=1) - y)) %>%
rowwise %>%
dplyr::mutate({{new_year}} := (Year + y)) %>%
ungroup()
varnames <- unlist(map(10:30, function(x) paste0("av", y, ".", x, "yr")))
map2(10:30, varnames, function(x, y) x_new %>%
dplyr::mutate({{y}} := rollmean(TS, k = x, fill = NA, align = "right")) %>%
as.data.frame()) %>%
reduce(left_join, by = c(names(x_new)[1:3])) %>%
select(-c("Year", "TS")) %>%
dplyr::rename(Year = 1)
}
results <- map(c(seq(50, 250, 1)), ~ moving_mean(ts, .x)) %>%
reduce(left_join, by = "Year") %>%
left_join(ts, ., by = "Year")
输出(这里只打印一小部分)
dim(results)
# [1] 2016 4223
results[60:70, 1:15]
Year TS av50.10yr av50.11yr av50.12yr av50.13yr av50.14yr av50.15yr av50.16yr av50.17yr av50.18yr av50.19yr av50.20yr av50.21yr av50.22yr
60 60 0.63 -0.080 NA NA NA NA NA NA NA NA NA NA NA NA
61 61 -1.50 0.123 -0.01727273 NA NA NA NA NA NA NA NA NA NA NA
62 62 -0.07 0.295 0.15545455 0.02416667 NA NA NA NA NA NA NA NA NA NA
63 63 -0.84 0.338 0.30000000 0.17166667 0.04923077 NA NA NA NA NA NA NA NA NA
64 64 -0.36 0.317 0.39363636 0.35416667 0.23153846 0.1135714 NA NA NA NA NA NA NA NA
65 65 0.34 0.442 0.29000000 0.36250000 0.32846154 0.2164286 0.1073333 NA NA NA NA NA NA NA
66 66 -0.45 0.312 0.35363636 0.22166667 0.29384615 0.2671429 0.1666667 0.067500 NA NA NA NA NA NA
67 67 -1.17 0.415 0.25000000 0.29333333 0.17615385 0.2464286 0.2246667 0.133125 0.04176471 NA NA NA NA NA
68 68 -0.77 0.355 0.42363636 0.27166667 0.31000000 0.2000000 0.2640000 0.242500 0.15529412 0.06777778 NA NA NA NA
69 69 0.51 0.477 0.45090909 0.50583333 0.35923077 0.3885714 0.2806667 0.335625 0.31117647 0.22500000 0.1384211 NA NA NA
70 70 -0.25 0.331 0.42272727 0.40333333 0.45769231 0.3250000 0.3546667 0.255625 0.30882353 0.28722222 0.2068421 0.1255 NA NA
数据
set.seed(28)
ts <- data.frame(Year = c(1:2016),
TS = sample(
seq(from = -1.5, to = 1.5, by = 0.01),
size = 2016,
replace = TRUE
))