如何根据特定类型记录以来的天数计算新变量

How to compute a new variable based on the number of days since a particular type of record

我正在尝试创建一个变量来显示特定事件发生后的天数。这是 的跟进,使用相同的数据。

数据如下所示(注意日期采用 DD-MM-YYYY 格式):

ID  date      drug  score
A   28/08/2016  2   3
A   29/08/2016  1   4
A   30/08/2016  2   4
A   2/09/2016   2   4
A   3/09/2016   1   4
A   4/09/2016   2   4
B   8/08/2016   1   3
B   9/08/2016   2   4
B   10/08/2016  2   3
B   11/08/2016  1   3
C   30/11/2016  2   4
C   2/12/2016   1   5
C   3/12/2016   2   1
C   5/12/2016   1   4
C   6/12/2016   2   4
C   8/12/2016   1   2
C   9/12/2016   1   2 

对于'drug':1=服用了药物,2=没有服用药物。

每次drug的值为1,如果那个ID有之前的记录也是drug==1,那么我需要生成一个新的值'lagtime'显示天数(不是行数!)自上次服药以来。

所以我要找的输出是:

ID  date      drug  score  lagtime
A   28/08/2016  2   3
A   29/08/2016  1   4
A   30/08/2016  2   4
A   2/09/2016   2   4
A   3/09/2016   1   4      5
A   4/09/2016   2   4
B   8/08/2016   1   3
B   9/08/2016   2   4
B   10/08/2016  2   3
B   11/08/2016  1   3      3
C   30/11/2016  2   4
C   2/12/2016   1   5
C   3/12/2016   2   1
C   5/12/2016   1   4      3
C   6/12/2016   2   4
C   8/12/2016   1   2      3
C   9/12/2016   1   2      1

所以我需要一种方法来生成(变异?)这个滞后时间分数,计算方法是每个药物 ==1 记录的日期减去前一个药物 ==1 记录的日期,按 ID 分组。 这让我完全糊涂了。

示例数据的代码如下:

data<-data.frame(ID=c("A","A","A","A","A","A","B","B","B","B","C","C","C","C","C","C","C"),
                 date=as.Date(c("28/08/2016","29/08/2016","30/08/2016","2/09/2016","3/09/2016","4/09/2016","8/08/2016","9/08/2016","10/08/2016","11/08/2016","30/11/2016","2/12/2016","3/12/2016","5/12/2016","6/12/2016","8/12/2016","9/12/2016"),format= "%d/%m/%Y"),
                 drug=c(2,1,2,2,1,2,1,2,2,1,2,1,2,1,2,1,1),
                 score=c(3,4,4,4,4,4,3,4,3,3,4,5,1,4,4,2,2))

我们可以使用data.table。将'data.frame'转换为'data.table'(setDT(data)),按'ID'分组,指定idrug ==1),得到'date'(diff(date)),与NA连接,因为diff输出长度比原始向量少1,转换为integer并赋值(:= ) 创建 'lagtime'。默认情况下,所有其他值将为 NA

library(data.table)
setDT(data)[drug==1, lagtime := as.integer(c(NA, diff(date))), ID]
data
#    ID       date drug score lagtime
# 1:  A 2016-08-28    2     3      NA
# 2:  A 2016-08-29    1     4      NA
# 3:  A 2016-08-30    2     4      NA
# 4:  A 2016-09-02    2     4      NA
# 5:  A 2016-09-03    1     4       5
# 6:  A 2016-09-04    2     4      NA
# 7:  B 2016-08-08    1     3      NA
# 8:  B 2016-08-09    2     4      NA
# 9:  B 2016-08-10    2     3      NA
#10:  B 2016-08-11    1     3       3
#11:  C 2016-11-30    2     4      NA
#12:  C 2016-12-02    1     5      NA
#13:  C 2016-12-03    2     1      NA
#14:  C 2016-12-05    1     4       3
#15:  C 2016-12-06    2     4      NA
#16:  C 2016-12-08    1     2       3
#17:  C 2016-12-09    1     2       1