如何计算不同开始日期的移动平均线?

How to calculate moving average for different starting date?

我想计算数据集中每个参与者的移动平均值。

参加者可能有多个访问日期,我想计算每次访问前过去3天和过去2天的平均值(不包括访问当天)。

例如,设 id=1,date=6/6/2017。

过去2天的平均值应该是6/5/2017和6/4/2017的平均值。

示例数据集生成如下。 我正在处理一个更大的数据集,有更多的参与者、更多的访问和更多的有价值的日子。我想找到一种计算这些平均值的有效方法。

timeseries <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3),                         date=c("6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017",
                            "6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017",
                            "6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017"),
                     value=c(2,3,4,NA,6,7,
                             NA,9,5,NA,3,2,
                             5,7,3,8,3,5))
> timeseries
   id     date value
1   1 6/1/2017     2
2   1 6/2/2017     3
3   1 6/3/2017     4
4   1 6/4/2017    NA
5   1 6/5/2017     6
6   1 6/6/2017     7
7   2 6/1/2017    NA
8   2 6/2/2017     9
9   2 6/3/2017     5
10  2 6/4/2017    NA
...

visit <- data.frame(id=c(1,1,2,3,3,3),
                date=c("6/6/2017","6/5/2017",
                       "6/6/2017",
                       "6/6/2017","6/5/2017","6/4/2017"))

> visit
  id     date
1  1 6/6/2017
2  1 6/5/2017
3  2 6/6/2017
4  3 6/6/2017
5  3 6/5/2017
6  3 6/4/2017

结果table应该是这样的,其中mean3是过去3天的平均值,mean2是过去2天的平均值

> result
  id     date mean3 mean2
1  1 6/6/2017            
2  1 6/5/2017            
3  2 6/6/2017            
4  3 6/6/2017            
5  3 6/5/2017            
6  3 6/4/2017     

对于visit中的每个id,我从timeseries中提取相应的数据,然后计算n_daysvalue中的mean .

library(lubridate)
n_days = 2
sapply(1:NROW(visit), function(i)
    with(subset(x = timeseries,
                subset = timeseries$id == visit$id[i]),
         mean(x = value[difftime(time1 = mdy(visit$date[i]),
                                 time2 = mdy(date),
                                 units = "days") <= n_days &
                            difftime(time1 = mdy(visit$date[i]),
                                time2 = mdy(date),
                                units = "days") > 0],
              na.rm = TRUE)))
#[1] 6.0 4.0 3.0 5.5 5.5 5.0