在R中计算具有一定数量事件的日期之前的天数

Calculating the number of days before a date with a certain number of events in R

我有一个包含日期和一些事件的数据集:

    date    number_of_events
1/14/2013   1
2/6/2013    1
6/5/2013    1
7/1/2013    2
7/15/2013   1
7/19/2013   1
8/1/2013    2

我想计算距离事件数为 2(或任何其他预定值)的日期还有多少天。

这是我的目标...

date    number_of_events    days_to_two_events
1/14/2013   1               168
2/6/2013    1               145
6/5/2013    1               26
7/1/2013    2               31
7/15/2013   1               17
7/19/2013   1               13
8/1/2013    2               0

使用 dplyrzoo:

df <- read.table(text = "date    number_of_events
1/14/2013   1
2/6/2013    1
6/5/2013    1
7/1/2013    2
7/15/2013   1
7/19/2013   1
8/1/2013    2", header= T)


library(dplyr)
library(zoo)

df %>%
  mutate(days_to_two_events = na.locf0(ifelse(lead(number_of_events, 1) == 2, lead(date, 1), NA), fromLast = TRUE)) %>%
  mutate(days_to_two_events = as.Date(days_to_two_events, format = "%m/%d/%Y")-as.Date(date, format = "%m/%d/%Y"))

为了完整起见,这里有两个使用反向滚动连接的解决方案

这两个解决方案的不同之处在于为具有 number_of_events == 2 的行计算日差的方式。第一个解决方案计算相对于下一行的日差 number_of_events == 2 包括它自己 。因此,days_to_two_events 在这种情况下为零。

第二个解决方案计算相对于下一行的日差,number_of_events == 2 排除自身。因此,对于具有 number_of_events == 2 的行,它会查找与具有 number_of_events == 2 后续 行的日期差异。这是预期的结果。

两种变体都假定 df 已经由 date 订购。

第一个变体

library(data.table)
setDT(df)[, date := lubridate::mdy(date)]
df[number_of_events == 2][
  df, on = "date", roll = -Inf, 
  .(date, number_of_events = i.number_of_events, days_to_two_events = x.date - date)]
         date number_of_events days_to_two_events
1: 2013-01-14                1           168 days
2: 2013-02-06                1           145 days
3: 2013-06-05                1            26 days
4: 2013-07-01                2             0 days
5: 2013-07-15                1            17 days
6: 2013-07-19                1            13 days
7: 2013-08-01                2             0 days

第二种变体(预期结果)

library(data.table)
setDT(df)[, date := lubridate::mdy(date)]
df[number_of_events == 2][
  df, on = "date", roll = -Inf, 
  .(date, number_of_events = i.number_of_events, 
    days_to_two_events = shift(x.date, -1, fill = last(x.date)) - date)]
         date number_of_events days_to_two_events
1: 2013-01-14                1           168 days
2: 2013-02-06                1           145 days
3: 2013-06-05                1            26 days
4: 2013-07-01                2            31 days
5: 2013-07-15                1            17 days
6: 2013-07-19                1            13 days
7: 2013-08-01                2             0 days

注意第 4 行中两个解决方案之间的区别。

数据

library(data.table)
df <- fread(text = "date    number_of_events
1/14/2013   1
2/6/2013    1
6/5/2013    1
7/1/2013    2
7/15/2013   1
7/19/2013   1
8/1/2013    2")