每组过去 window 大小日期的总和
sum over past window-size dates per group
问题与How do I do a conditional sum which only looks between certain date criteria类似,但略有不同,其答案不适合当前问题。主要区别是基于每个组的日期列不一定是完整的(即,某些日期可能会丢失)
输入:
input <- read.table(text="
2017-04-01 A 1
2017-04-02 B 2
2017-04-02 B 2
2017-04-02 C 2
2017-04-02 A 2
2017-04-03 C 3
2017-04-04 A 4
2017-04-05 B 5
2017-04-06 C 6
2017-04-07 A 7
2017-04-08 B 8
2017-04-09 C 9")
colnames(input) <- c("Date","Group","Score")
规则:对于每个日期的每个组,回顾 3 个日历日期(包括当前日期)。计算总和。
预期输出:
Date Group 3DaysSumPerGroup
2017-04-01 A 1 #1 previous two dates are not available. partial is allowed
2017-04-02 A 3 #2+1 both 4-01 and 4-02 are in the range
2017-04-04 A 6 #4+2
2017-04-07 A 7 #7
2017-04-02 B 4 # 2+2 at the same day
2017-04-05 B 5
2017-04-08 B 8
2017-04-02 C 2
2017-04-03 C 5
2017-04-06 C 6
2017-04-09 C 9
我尝试将 rollapply 与 partial=T 一起使用,但结果似乎不正确。
input %>%
group_by(Group) %>%
arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))
这是一个(据称有效的)解决方案,它使用新的 非等值连接 和 data.table (v1.9.8+) 中的 by = .EACHI
功能
library(data.table) #v1.10.4
## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
Date = as.IDate(Date)
Date2 = Date - 2L
.(Date, Date2)
}]
## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column
input[unique(input, by = c("Date", "Group")), ## This removes the dupes
on = .(Group, Date <= Date, Date >= Date2), ## The join condition
.(Score = sum(Score)), ## sum the scores
keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))
# Group Date Date Score
# 1: A 2017-04-01 2017-03-30 1
# 2: A 2017-04-02 2017-03-31 3
# 3: A 2017-04-04 2017-04-02 6
# 4: A 2017-04-07 2017-04-05 7
# 5: B 2017-04-02 2017-03-31 4
# 6: B 2017-04-05 2017-04-03 5
# 7: B 2017-04-08 2017-04-06 8
# 8: C 2017-04-02 2017-03-31 2
# 9: C 2017-04-03 2017-04-01 5
# 10: C 2017-04-06 2017-04-04 6
# 11: C 2017-04-09 2017-04-07 9
问题与How do I do a conditional sum which only looks between certain date criteria类似,但略有不同,其答案不适合当前问题。主要区别是基于每个组的日期列不一定是完整的(即,某些日期可能会丢失)
输入:
input <- read.table(text="
2017-04-01 A 1
2017-04-02 B 2
2017-04-02 B 2
2017-04-02 C 2
2017-04-02 A 2
2017-04-03 C 3
2017-04-04 A 4
2017-04-05 B 5
2017-04-06 C 6
2017-04-07 A 7
2017-04-08 B 8
2017-04-09 C 9")
colnames(input) <- c("Date","Group","Score")
规则:对于每个日期的每个组,回顾 3 个日历日期(包括当前日期)。计算总和。
预期输出:
Date Group 3DaysSumPerGroup
2017-04-01 A 1 #1 previous two dates are not available. partial is allowed
2017-04-02 A 3 #2+1 both 4-01 and 4-02 are in the range
2017-04-04 A 6 #4+2
2017-04-07 A 7 #7
2017-04-02 B 4 # 2+2 at the same day
2017-04-05 B 5
2017-04-08 B 8
2017-04-02 C 2
2017-04-03 C 5
2017-04-06 C 6
2017-04-09 C 9
我尝试将 rollapply 与 partial=T 一起使用,但结果似乎不正确。
input %>%
group_by(Group) %>%
arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))
这是一个(据称有效的)解决方案,它使用新的 非等值连接 和 data.table (v1.9.8+) 中的 by = .EACHI
功能
library(data.table) #v1.10.4
## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
Date = as.IDate(Date)
Date2 = Date - 2L
.(Date, Date2)
}]
## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column
input[unique(input, by = c("Date", "Group")), ## This removes the dupes
on = .(Group, Date <= Date, Date >= Date2), ## The join condition
.(Score = sum(Score)), ## sum the scores
keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))
# Group Date Date Score
# 1: A 2017-04-01 2017-03-30 1
# 2: A 2017-04-02 2017-03-31 3
# 3: A 2017-04-04 2017-04-02 6
# 4: A 2017-04-07 2017-04-05 7
# 5: B 2017-04-02 2017-03-31 4
# 6: B 2017-04-05 2017-04-03 5
# 7: B 2017-04-08 2017-04-06 8
# 8: C 2017-04-02 2017-03-31 2
# 9: C 2017-04-03 2017-04-01 5
# 10: C 2017-04-06 2017-04-04 6
# 11: C 2017-04-09 2017-04-07 9