以时间戳之间的差异为条件的总和

Question

假设我们观察视频游戏玩家收集积分。每个观察报告自上次访问以来玩家收集了多少分。现在我想创建一个额外的变量，指示用户在过去 x（例如 90）秒内收集了多少点，包括当前观察。

示例数据：

example_da = data.frame(time = c("2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
                                 "2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
                        points = c(2,3,1,6,2,5,1,1,3,5,2,4))

> example_da
                  time points
1  2015-04-11 21:24:34      2
2  2015-04-11 21:24:50      3
3  2015-04-11 21:25:20      1
4  2015-04-11 21:27:52      6
5  2015-04-11 21:27:59      2
6  2015-04-11 21:28:13      5
7  2015-04-11 21:30:06      1
8  2015-04-11 21:31:05      1
9  2015-04-11 21:31:47      3
10 2015-04-11 21:38:01      5
11 2015-04-11 21:39:05      2
12 2015-04-11 21:40:06      4

例如，对于观察 3 ("2015-04-11 21:25:20")，我们总结了 "2015-04-11 21:24:34" (= 2)、"2015 -04-11 21:24:50" (=3), "2015-04-11 21:25:20" (=1), 因为这些点数都是在前 90 秒内收集的，所以我们得到 6 点新变量“sum_points_preceding_90_seconds”。

> target_da = data.frame(time = c("2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
+                                 "2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
+                        points = c(2,3,1,6,2,5,1,1,3,5,2,4),
+                        sum_points_preceding_90_seconds = c(2, 5, 6, 6, 8, 13, 1, 2, 5, 5, 7, 6))
> 
> 
> target_da                    
                  time points sum_points_preceding_90_seconds
1  2015-04-11 21:24:34      2                               2
2  2015-04-11 21:24:50      3                               5
3  2015-04-11 21:25:20      1                               6
4  2015-04-11 21:27:52      6                               6
5  2015-04-11 21:27:59      2                               8
6  2015-04-11 21:28:13      5                              13
7  2015-04-11 21:30:06      1                               1
8  2015-04-11 21:31:05      1                               2
9  2015-04-11 21:31:47      3                               5
10 2015-04-11 21:38:01      5                               5
11 2015-04-11 21:39:05      2                               7
12 2015-04-11 21:40:06      4                               6

Answer 1

您可以使用 slide_index_sum() 对 slider 包执行此操作。它允许您指定一个 index，然后在该索引的每个元素之前或之后创建边界以生成滑动 windows.

我认为您对 2015-04-11 21:31:47 的预期结果可能有误？看起来应该是 4 而不是 5？

您可能需要根据具体要求调整 before。

library(slider)
library(dplyr)

example_da <- tibble(
  time = c(
    "2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", 
    "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
    "2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", 
    "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
  points = c(2,3,1,6,2,5,1,1,3,5,2,4)
)

example_da <- mutate(example_da, time = as.POSIXct(time, "UTC"))

# The current time + 89 seconds before it = 90 seconds total
example_da <- example_da %>%
  mutate(
    sum_points_preceding_90_seconds =
      slide_index_sum(
        x = points,
        i = time,
        before = 89
      )
  )

example_da
#> # A tibble: 12 × 3
#>    time                points sum_points_preceding_90_seconds
#>    <dttm>               <dbl>                           <dbl>
#>  1 2015-04-11 21:24:34      2                               2
#>  2 2015-04-11 21:24:50      3                               5
#>  3 2015-04-11 21:25:20      1                               6
#>  4 2015-04-11 21:27:52      6                               6
#>  5 2015-04-11 21:27:59      2                               8
#>  6 2015-04-11 21:28:13      5                              13
#>  7 2015-04-11 21:30:06      1                               1
#>  8 2015-04-11 21:31:05      1                               2
#>  9 2015-04-11 21:31:47      3                               4
#> 10 2015-04-11 21:38:01      5                               5
#> 11 2015-04-11 21:39:05      2                               7
#> 12 2015-04-11 21:40:06      4                               6

以时间戳之间的差异为条件的总和

Sum conditional on difference between timestamps

time

r

date

lubridate

dplyr