R中重复ID的时间间隔计算

Time interval Calculation for duplicated ID's in R

我有一个大数据集。

  1. 我想创建一个列来显示每个重复 ID 的开始日期和结束日期(从上一行开始)之间的天数。 比如R1,因为不是重复的,所以我就不计算区间了。 对于 R2,首先,我需要根据开始日期以递增方式对其进行排序。然后我计算第二个最早的开始日期和上一行的结束日期之间的天数。接下来我继续计算从第二早开始日期到第三早开始日期和结束日期之间的天数,以此类推。我也想对任何其他重复的 ID 执行此操作。

  2. 然后我想创建一个新列,以与第一部分相同的方式计算具有相同事件级别的重复 ID 的天数。 我想知道我该怎么做。


ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
START<-c("3-4-2013","4-5-2018","4-5-2015","4-6-2011","5-5-2012","1-9-2010","23-4-1999","25-6-2011","3-6-2011","4-5-2014",
    "6-6-2016","5-7-2014","7-7-1990","3-3-1998","4-4-1990","7-8-2014","22-4-1970","23-5-1984")
End<-c("3-4-2014","4-5-2019","5-5-2015","4-6-2013","5-5-2014","1-9-2012","23-4-2010","25-6-2015","3-6-2013","6-5-2014",
    "6-8-2016","5-8-2014","7-9-1990","3-7-1998","4-9-1990","7-12-2014","22-7-1970","23-8-1984")
event<-c("a","b","b","s","s","f","f","b","b","a","a","a","s","c","c","b","m","a")
df<-data.frame(ID,START,End,event)

所以结果会是这样的:

ID     START       End     event   Time1                     Time2
1  R1  3-4-2013  3-4-2014     a     NA                        NA
14 R2  3-3-1998  3-7-1998     c     NA                        NA
15 R2  4-4-1990  4-9-1990     c    (4-4-1990)-(3-7-1998)   (4-4-1990)-(3-7-1998)
3  R2  4-5-2015  5-5-2015     b    (4-5-2015)-(4-9-1990)      NA
2  R2  4-5-2018  4-5-2019     b    (4-5-2018)-(5-5-2015)   (4-5-2018)-(5-5-2015)
16 R2  7-8-2014 7-12-2014     b    (7-8-2014)-(4-5-2019)   (7-8-2014)-(4-5-2019)
10 R3  4-5-2014  6-5-2014     a     NA                        NA
4  R3  4-6-2011  4-6-2013     s    (4-6-2011)-(6-5-2014)      NA
5  R3  5-5-2012  5-5-2014     s    (5-5-2012)-(4-6-2013)   (5-5-2012)-(4-6-2013)                    
12 R3  5-7-2014  5-8-2014     a    (5-7-2014)-(5-5-2014)   (5-7-2014)-(6-5-2014)
11 R3  6-6-2016  6-8-2016     a    (6-6-2016)-(5-8-2014)   (6-6-2016)-(5-8-2014)
13 R3  7-7-1990  7-9-1990     s                            (7-7-1990)-(5-5-2014)
6  R4  1-9-2010  1-9-2012     f
7  R4 23-4-1999 23-4-2010     f
8  R4 25-6-2011 25-6-2015     b
9  R4  3-6-2011  3-6-2013     b
17 R5 22-4-1970 22-7-1970     m
18 R6 23-5-1984 23-8-1984     a
> 

实现此目的的一种方法是使用 dplyr 包,如下所示(在修复您的数据框后,如下所示):

library(dplyr)
df<-data.frame(ID,START,End,event, stringsAsFactors = FALSE)
df$START <- as.Date(df$START, format = '%d-%m-%Y')
df$End <- as.Date(df$End, format = '%d-%m-%Y')
df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'))

不确定上面#2 中你想要的是什么,但是,如果你想在给定的行中创建 'event duration',你只需执行以下操作:

df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'), eventDuration = difftime(End, START, units = 'days'))

此处输出:

Source: local data frame [18 x 6]
Groups: ID [6]

      ID      START        End event laggedTimeElapsed eventDuration
   (chr)     (date)     (date) (chr)            (dfft)        (dfft)
1     R1 2013-04-03 2014-04-03     a           NA days      365 days
2     R2 1990-04-04 1990-09-04     c           NA days      153 days
3     R2 1998-03-03 1998-07-03     c         2737 days      122 days
4     R2 2014-08-07 2014-12-07     b         5879 days      122 days
5     R2 2015-05-04 2015-05-05     b          148 days        1 days
6     R2 2018-05-04 2019-05-04     b         1095 days      365 days
7     R3 1990-07-07 1990-09-07     s           NA days       62 days
8     R3 2011-06-04 2013-06-04     s         7575 days      731 days
9     R3 2012-05-05 2014-05-05     s         -395 days      730 days
10    R3 2014-05-04 2014-05-06     a           -1 days        2 days
11    R3 2014-07-05 2014-08-05     a           60 days       31 days
12    R3 2016-06-06 2016-08-06     a          671 days       61 days
13    R4 1999-04-23 2010-04-23     f           NA days     4018 days
14    R4 2010-09-01 2012-09-01     f          131 days      731 days
15    R4 2011-06-03 2013-06-03     b         -456 days      731 days
16    R4 2011-06-25 2015-06-25     b         -709 days     1461 days
17    R5 1970-04-22 1970-07-22     m           NA days       91 days
18    R6 1984-05-23 1984-08-23     a           NA days       92 days