以时间戳之间的差异为条件的总和
Sum conditional on difference between timestamps
假设我们观察视频游戏玩家收集积分。每个观察报告自上次访问以来玩家收集了多少分。现在我想创建一个额外的变量,指示用户在过去 x(例如 90)秒内收集了多少点,包括当前观察。
示例数据:
example_da = data.frame(time = c("2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
"2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
points = c(2,3,1,6,2,5,1,1,3,5,2,4))
> example_da
time points
1 2015-04-11 21:24:34 2
2 2015-04-11 21:24:50 3
3 2015-04-11 21:25:20 1
4 2015-04-11 21:27:52 6
5 2015-04-11 21:27:59 2
6 2015-04-11 21:28:13 5
7 2015-04-11 21:30:06 1
8 2015-04-11 21:31:05 1
9 2015-04-11 21:31:47 3
10 2015-04-11 21:38:01 5
11 2015-04-11 21:39:05 2
12 2015-04-11 21:40:06 4
例如,对于观察 3 ("2015-04-11 21:25:20"),我们总结了 "2015-04-11 21:24:34" (= 2)、"2015 -04-11 21:24:50" (=3), "2015-04-11 21:25:20" (=1), 因为这些点数都是在前 90 秒内收集的,所以我们得到 6 点新变量“sum_points_preceding_90_seconds”。
> target_da = data.frame(time = c("2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
+ "2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
+ points = c(2,3,1,6,2,5,1,1,3,5,2,4),
+ sum_points_preceding_90_seconds = c(2, 5, 6, 6, 8, 13, 1, 2, 5, 5, 7, 6))
>
>
> target_da
time points sum_points_preceding_90_seconds
1 2015-04-11 21:24:34 2 2
2 2015-04-11 21:24:50 3 5
3 2015-04-11 21:25:20 1 6
4 2015-04-11 21:27:52 6 6
5 2015-04-11 21:27:59 2 8
6 2015-04-11 21:28:13 5 13
7 2015-04-11 21:30:06 1 1
8 2015-04-11 21:31:05 1 2
9 2015-04-11 21:31:47 3 5
10 2015-04-11 21:38:01 5 5
11 2015-04-11 21:39:05 2 7
12 2015-04-11 21:40:06 4 6
您可以使用 slide_index_sum()
对 slider 包执行此操作。它允许您指定一个 index,然后在该索引的每个元素之前或之后创建边界以生成滑动 windows.
我认为您对 2015-04-11 21:31:47
的预期结果可能有误?看起来应该是 4 而不是 5?
您可能需要根据具体要求调整 before
。
library(slider)
library(dplyr)
example_da <- tibble(
time = c(
"2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20",
"2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
"2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47",
"2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
points = c(2,3,1,6,2,5,1,1,3,5,2,4)
)
example_da <- mutate(example_da, time = as.POSIXct(time, "UTC"))
# The current time + 89 seconds before it = 90 seconds total
example_da <- example_da %>%
mutate(
sum_points_preceding_90_seconds =
slide_index_sum(
x = points,
i = time,
before = 89
)
)
example_da
#> # A tibble: 12 × 3
#> time points sum_points_preceding_90_seconds
#> <dttm> <dbl> <dbl>
#> 1 2015-04-11 21:24:34 2 2
#> 2 2015-04-11 21:24:50 3 5
#> 3 2015-04-11 21:25:20 1 6
#> 4 2015-04-11 21:27:52 6 6
#> 5 2015-04-11 21:27:59 2 8
#> 6 2015-04-11 21:28:13 5 13
#> 7 2015-04-11 21:30:06 1 1
#> 8 2015-04-11 21:31:05 1 2
#> 9 2015-04-11 21:31:47 3 4
#> 10 2015-04-11 21:38:01 5 5
#> 11 2015-04-11 21:39:05 2 7
#> 12 2015-04-11 21:40:06 4 6
假设我们观察视频游戏玩家收集积分。每个观察报告自上次访问以来玩家收集了多少分。现在我想创建一个额外的变量,指示用户在过去 x(例如 90)秒内收集了多少点,包括当前观察。
示例数据:
example_da = data.frame(time = c("2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
"2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
points = c(2,3,1,6,2,5,1,1,3,5,2,4))
> example_da
time points
1 2015-04-11 21:24:34 2
2 2015-04-11 21:24:50 3
3 2015-04-11 21:25:20 1
4 2015-04-11 21:27:52 6
5 2015-04-11 21:27:59 2
6 2015-04-11 21:28:13 5
7 2015-04-11 21:30:06 1
8 2015-04-11 21:31:05 1
9 2015-04-11 21:31:47 3
10 2015-04-11 21:38:01 5
11 2015-04-11 21:39:05 2
12 2015-04-11 21:40:06 4
例如,对于观察 3 ("2015-04-11 21:25:20"),我们总结了 "2015-04-11 21:24:34" (= 2)、"2015 -04-11 21:24:50" (=3), "2015-04-11 21:25:20" (=1), 因为这些点数都是在前 90 秒内收集的,所以我们得到 6 点新变量“sum_points_preceding_90_seconds”。
> target_da = data.frame(time = c("2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20", "2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
+ "2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47", "2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
+ points = c(2,3,1,6,2,5,1,1,3,5,2,4),
+ sum_points_preceding_90_seconds = c(2, 5, 6, 6, 8, 13, 1, 2, 5, 5, 7, 6))
>
>
> target_da
time points sum_points_preceding_90_seconds
1 2015-04-11 21:24:34 2 2
2 2015-04-11 21:24:50 3 5
3 2015-04-11 21:25:20 1 6
4 2015-04-11 21:27:52 6 6
5 2015-04-11 21:27:59 2 8
6 2015-04-11 21:28:13 5 13
7 2015-04-11 21:30:06 1 1
8 2015-04-11 21:31:05 1 2
9 2015-04-11 21:31:47 3 5
10 2015-04-11 21:38:01 5 5
11 2015-04-11 21:39:05 2 7
12 2015-04-11 21:40:06 4 6
您可以使用 slide_index_sum()
对 slider 包执行此操作。它允许您指定一个 index,然后在该索引的每个元素之前或之后创建边界以生成滑动 windows.
我认为您对 2015-04-11 21:31:47
的预期结果可能有误?看起来应该是 4 而不是 5?
您可能需要根据具体要求调整 before
。
library(slider)
library(dplyr)
example_da <- tibble(
time = c(
"2015-04-11 21:24:34", "2015-04-11 21:24:50", "2015-04-11 21:25:20",
"2015-04-11 21:27:52", "2015-04-11 21:27:59", "2015-04-11 21:28:13",
"2015-04-11 21:30:06", "2015-04-11 21:31:05", "2015-04-11 21:31:47",
"2015-04-11 21:38:01", "2015-04-11 21:39:05", "2015-04-11 21:40:06"),
points = c(2,3,1,6,2,5,1,1,3,5,2,4)
)
example_da <- mutate(example_da, time = as.POSIXct(time, "UTC"))
# The current time + 89 seconds before it = 90 seconds total
example_da <- example_da %>%
mutate(
sum_points_preceding_90_seconds =
slide_index_sum(
x = points,
i = time,
before = 89
)
)
example_da
#> # A tibble: 12 × 3
#> time points sum_points_preceding_90_seconds
#> <dttm> <dbl> <dbl>
#> 1 2015-04-11 21:24:34 2 2
#> 2 2015-04-11 21:24:50 3 5
#> 3 2015-04-11 21:25:20 1 6
#> 4 2015-04-11 21:27:52 6 6
#> 5 2015-04-11 21:27:59 2 8
#> 6 2015-04-11 21:28:13 5 13
#> 7 2015-04-11 21:30:06 1 1
#> 8 2015-04-11 21:31:05 1 2
#> 9 2015-04-11 21:31:47 3 4
#> 10 2015-04-11 21:38:01 5 5
#> 11 2015-04-11 21:39:05 2 7
#> 12 2015-04-11 21:40:06 4 6