具有可变宽度的滚动 30 天几何平均值
rolling 30-day geometric mean with variable width
@ShirinYavari 对 的解决方案几乎是我所需要的,除了使用宽度为 2 的静态平均 window。我有一个包含来自多个站点的随机样本的数据集想要计算滚动的 30 天几何均值。我希望对给定样本的 30 天 window 内的所有样本进行平均,并且如果前面的样本在时间上相距较远或较近,宽度可能会发生变化,例如,您是否需要平均 2、3 或如果在给定样本的 30 天内有 1、2 或更多先前样本,则更多样本。
这是一些示例数据,加上我的代码尝试:
RESULT = c(50,900,25,25,125,50,25,25,2000,25,25,
25,25,25,25,25,25,325,25,300,475,25)
DATE = as.Date(c("2018-05-23","2018-06-05","2018-06-17",
"2018-08-20","2018-10-05","2016-05-22",
"2016-06-20","2016-07-25","2016-08-11",
"2017-07-21","2017-08-08","2017-09-18",
"2017-10-12","2011-04-19","2011-06-29",
"2011-08-24","2011-10-23","2012-06-28",
"2012-07-16","2012-08-14","2012-09-29",
"2012-10-24"))
FINAL_SITEID = c(rep("A", 5), rep("B", 8), rep("C", 9))
df=data.frame(FINAL_SITEID,DATE,RESULT)
data_roll <- df %>%
group_by(FINAL_SITEID) %>%
arrange(DATE) %>%
mutate(day=DATE-dplyr::lag(DATE, n=1),
day=replace_na(day, 1),
rnk=cumsum(c(TRUE, day > 30))) %>%
group_by(FINAL_SITEID, rnk) %>%
mutate(count=rowid(rnk)) %>%
mutate(GM30=rollapply(RESULT, width=count, geometric.mean, fill=RESULT, align="right"))
我收到此错误消息,看起来应该很容易修复,但我无法弄清楚:
Error: Column `rnk` must be length 5 (the group size) or one, not 6
根据日期时间计算滚动统计数据的最简单方法 windows 是 runner package. You don't have to hack around to get just 30-days windows. Function runner 允许您在滚动 window 中应用任何 R 函数。以下是 FINAL_SITEID
组中 30 天 geometric.mean 的示例:
library(psych)
library(runner)
df %>%
group_by(FINAL_SITEID) %>%
arrange(DATE) %>%
mutate(GM30 = runner(RESULT, k = 30, idx = DATE, f = geometric.mean))
# FINAL_SITEID DATE RESULT GM30
# <fct> <date> <dbl> <dbl>
# 1 C 2011-04-19 25 25.0
# 2 C 2011-06-29 25 25.0
# 3 C 2011-08-24 25 25.0
# 4 C 2011-10-23 25 25.0
# 5 C 2012-06-28 325 325.
# 6 C 2012-07-16 25 90.1
# 7 C 2012-08-14 300 86.6
# 8 C 2012-09-29 475 475.
# 9 C 2012-10-24 25 109.
# 10 B 2016-05-22 50 50.0
rollapply 的宽度参数可以是宽度向量,可以使用 findInterval
设置。 rollapply help file 的示例部分显示了这方面的示例,我们在下面使用它。
library(dplyr)
library(psych)
library(zoo)
data_roll <- df %>%
arrange(FINAL_SITEID, DATE) %>%
group_by(FINAL_SITEID) %>%
mutate(GM30 = rollapplyr(RESULT, 1:n() - findInterval(DATE - 30, DATE),
geometric.mean, fill = NA)) %>%
ungroup
给予:
# A tibble: 22 x 4
FINAL_SITEID DATE RESULT GM30
<fct> <date> <dbl> <dbl>
1 A 2018-05-23 50 50.0
2 A 2018-06-05 900 212.
3 A 2018-06-17 25 104.
4 A 2018-08-20 25 25.0
5 A 2018-10-05 125 125.
6 B 2016-05-22 50 50.0
7 B 2016-06-20 25 35.4
8 B 2016-07-25 25 25.0
9 B 2016-08-11 2000 224.
10 B 2017-07-21 25 25.0
# ... with 12 more rows
@ShirinYavari 对
这是一些示例数据,加上我的代码尝试:
RESULT = c(50,900,25,25,125,50,25,25,2000,25,25,
25,25,25,25,25,25,325,25,300,475,25)
DATE = as.Date(c("2018-05-23","2018-06-05","2018-06-17",
"2018-08-20","2018-10-05","2016-05-22",
"2016-06-20","2016-07-25","2016-08-11",
"2017-07-21","2017-08-08","2017-09-18",
"2017-10-12","2011-04-19","2011-06-29",
"2011-08-24","2011-10-23","2012-06-28",
"2012-07-16","2012-08-14","2012-09-29",
"2012-10-24"))
FINAL_SITEID = c(rep("A", 5), rep("B", 8), rep("C", 9))
df=data.frame(FINAL_SITEID,DATE,RESULT)
data_roll <- df %>%
group_by(FINAL_SITEID) %>%
arrange(DATE) %>%
mutate(day=DATE-dplyr::lag(DATE, n=1),
day=replace_na(day, 1),
rnk=cumsum(c(TRUE, day > 30))) %>%
group_by(FINAL_SITEID, rnk) %>%
mutate(count=rowid(rnk)) %>%
mutate(GM30=rollapply(RESULT, width=count, geometric.mean, fill=RESULT, align="right"))
我收到此错误消息,看起来应该很容易修复,但我无法弄清楚:
Error: Column `rnk` must be length 5 (the group size) or one, not 6
根据日期时间计算滚动统计数据的最简单方法 windows 是 runner package. You don't have to hack around to get just 30-days windows. Function runner 允许您在滚动 window 中应用任何 R 函数。以下是 FINAL_SITEID
组中 30 天 geometric.mean 的示例:
library(psych)
library(runner)
df %>%
group_by(FINAL_SITEID) %>%
arrange(DATE) %>%
mutate(GM30 = runner(RESULT, k = 30, idx = DATE, f = geometric.mean))
# FINAL_SITEID DATE RESULT GM30
# <fct> <date> <dbl> <dbl>
# 1 C 2011-04-19 25 25.0
# 2 C 2011-06-29 25 25.0
# 3 C 2011-08-24 25 25.0
# 4 C 2011-10-23 25 25.0
# 5 C 2012-06-28 325 325.
# 6 C 2012-07-16 25 90.1
# 7 C 2012-08-14 300 86.6
# 8 C 2012-09-29 475 475.
# 9 C 2012-10-24 25 109.
# 10 B 2016-05-22 50 50.0
rollapply 的宽度参数可以是宽度向量,可以使用 findInterval
设置。 rollapply help file 的示例部分显示了这方面的示例,我们在下面使用它。
library(dplyr)
library(psych)
library(zoo)
data_roll <- df %>%
arrange(FINAL_SITEID, DATE) %>%
group_by(FINAL_SITEID) %>%
mutate(GM30 = rollapplyr(RESULT, 1:n() - findInterval(DATE - 30, DATE),
geometric.mean, fill = NA)) %>%
ungroup
给予:
# A tibble: 22 x 4
FINAL_SITEID DATE RESULT GM30
<fct> <date> <dbl> <dbl>
1 A 2018-05-23 50 50.0
2 A 2018-06-05 900 212.
3 A 2018-06-17 25 104.
4 A 2018-08-20 25 25.0
5 A 2018-10-05 125 125.
6 B 2016-05-22 50 50.0
7 B 2016-06-20 25 35.4
8 B 2016-07-25 25 25.0
9 B 2016-08-11 2000 224.
10 B 2017-07-21 25 25.0
# ... with 12 more rows