R：计算R中数据集中每个唯一个体在过去特定时间段内出现的次数

Question

我正在尝试统计过去特定时间段内给定个人发生事件的次数。在这种特殊情况下，对于每个新观察（反映单个调度请求），我需要知道个人在过去 60 天内安排了多少次旅行 (trip_scheduled)。最终我需要统计那个人在前 60 天的预定行程当天取消的次数。但我只是从 "moving" 60 天期间的计数开始。

我在这个 post 中找到了一些类似但略有不同的问题的优雅答案：

我的情况在几个方面有所不同：首先，我正在尝试查看以前的时间段，我不知道这是否会改变我的方法，其次，我需要进行分析对于超过 40,000 个人，我一直在尝试通过混合使用我在另一个答案中找到的代码、一个 for 循环（我知道这是不受欢迎的）和 dplyr 分组来完成。它根本不起作用。

谁能帮我指明正确的方向？我很乐意坚持使用 dplyr 和 base。我只是对 data.table.

了解不多

这是我一直尝试使用的代码和测试数据：

test_set2 <- structure(list(tripID = c("20180112-100037-674-101", "20180112-100037-674-201", 
                                       "20180112-100037-674-301", "20180113-100037-676-101", "20180113-100037-676-201", 
                                       "20180115-100037-675-101", "20180115-100037-675-201", "20180116-100037-677-101", 
                                       "20180116-100037-677-201", "20180131-100037-678-101", "20180101-100146-707-101", 
                                       "20180101-100146-707-201", "20180102-100146-708-101", "20180102-100146-708-201", 
                                       "20180103-100146-709-101", "20180103-100146-709-201", "20180104-100146-710-101", 
                                       "20180104-100146-710-201", "20180105-100146-711-101", "20180105-100146-711-201", 
                                       "20180403-100532-223-101", "20180403-100532-223-201", "20180620-100532-224-101", 
                                       "20180620-100532-224-201", "20180704-100532-225-101", "20180704-100532-225-201", 
                                       "20180926-100532-228-101", "20180926-100532-228-201", "20180927-100532-226-101", 
                                       "20180927-100532-226-201"), CUSTOMER_ID = c(100037L, 100037L, 
                                                                                   100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L, 
                                                                                   100037L, 100146L, 100146L, 100146L, 100146L, 100146L, 100146L, 
                                                                                   100146L, 100146L, 100146L, 100146L, 100532L, 100532L, 100532L, 
                                                                                   100532L, 100532L, 100532L, 100532L, 100532L, 100532L, 100532L
                                       ), trip_date = structure(c(17543, 17543, 17543, 17544, 17544, 
                                                                  17546, 17546, 17547, 17547, 17562, 17532, 17532, 17533, 17533, 
                                                                  17534, 17534, 17535, 17535, 17536, 17536, 17624, 17624, 17702, 
                                                                  17702, 17716, 17716, 17800, 17800, 17801, 17801), class = "Date"), 
                            trip_scheduled = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), same_day_cancel = c(1, 
                                                                                                                       1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                                       0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -30L), groups = structure(list(
                                                                                                                         CUSTOMER_ID = c(100037L, 100146L, 100532L), .rows = list(
                                                                                                                           1:10, 11:20, 21:30)), row.names = c(NA, -3L), class = c("tbl_df", 
                                                                                                                                                                                   "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
                                                                                                                                                                                                                                  "tbl_df", "tbl", "data.frame"))

running_frame <- test_set2[1,]

unique_customers <- unique(test_set2$CUSTOMER_ID)

for (cust in unique_customers){
  temp_events <- test_set2 %>% filter(CUSTOMER_ID == i)
  cs = cumsum(temp_events$trip_scheduled) # cumulative number of trips of individual
  output_temp <- data.frame(temp_events, 
                            trips_minus_60 = cs[findInterval(temp_events$trip_date - 60, temp_events$trip_date, left.open = TRUE)] - cs)
  new_table <- rbind(new_table,output_temp)

}

这是我最近产生的错误：

错误 data.frame(temp_events, trips_minus_60 = cs[findInterval(temp_events$trip_date - : 参数表示不同的行数：10、0

Answer 1

我不确定这是否满足您的需求，但这是基于您链接到的@Axeman 的 tidyverse 解决方案。在 group_by 您的 CUSTOMER_ID 之后，您可以对所有行求和 trip_scheduled 是 1 并且日期介于当前日期和 60 天之前。我希望您也可以为 same_day_cancel 做类似的事情。

library(tidyverse)

test_set2 %>% 
  group_by(CUSTOMER_ID) %>%
    mutate(schedule_60 = unlist(map(trip_date, ~sum(trip_scheduled == 1 & between(trip_date, . - 60, .))))) %>%
  print(n=30)

# A tibble: 30 x 6
# Groups:   CUSTOMER_ID [3]
   tripID                  CUSTOMER_ID trip_date  trip_scheduled same_day_cancel schedule_60
   <chr>                         <int> <date>              <dbl>           <dbl>       <int>
 1 20180112-100037-674-101      100037 2018-01-12              1               1           3
 2 20180112-100037-674-201      100037 2018-01-12              1               1           3
 3 20180112-100037-674-301      100037 2018-01-12              1               1           3
 4 20180113-100037-676-101      100037 2018-01-13              1               0           5
 5 20180113-100037-676-201      100037 2018-01-13              1               0           5
 6 20180115-100037-675-101      100037 2018-01-15              1               1           7
 7 20180115-100037-675-201      100037 2018-01-15              1               1           7
 8 20180116-100037-677-101      100037 2018-01-16              1               0           9
 9 20180116-100037-677-201      100037 2018-01-16              1               0           9
10 20180131-100037-678-101      100037 2018-01-31              1               0          10
11 20180101-100146-707-101      100146 2018-01-01              1               1           2
12 20180101-100146-707-201      100146 2018-01-01              1               1           2
13 20180102-100146-708-101      100146 2018-01-02              1               1           4
14 20180102-100146-708-201      100146 2018-01-02              1               1           4
15 20180103-100146-709-101      100146 2018-01-03              1               1           6
16 20180103-100146-709-201      100146 2018-01-03              1               1           6
17 20180104-100146-710-101      100146 2018-01-04              1               1           8
18 20180104-100146-710-201      100146 2018-01-04              1               1           8
19 20180105-100146-711-101      100146 2018-01-05              1               1          10
20 20180105-100146-711-201      100146 2018-01-05              1               1          10
21 20180403-100532-223-101      100532 2018-04-03              1               0           2
22 20180403-100532-223-201      100532 2018-04-03              1               0           2
23 20180620-100532-224-101      100532 2018-06-20              1               0           2
24 20180620-100532-224-201      100532 2018-06-20              1               0           2
25 20180704-100532-225-101      100532 2018-07-04              1               0           4
26 20180704-100532-225-201      100532 2018-07-04              1               0           4
27 20180926-100532-228-101      100532 2018-09-26              1               0           2
28 20180926-100532-228-201      100532 2018-09-26              1               0           2
29 20180927-100532-226-101      100532 2018-09-27              1               0           4
30 20180927-100532-226-201      100532 2018-09-27              1               0           4

R：计算R中数据集中每个唯一个体在过去特定时间段内出现的次数

R: Calculating the number of occurrences within a specific time period in the past for each unique individual in a dataset in R

time

r

counting

period