通过 data.table 中时间间隔的逻辑子集定义变量

Defining variable by logical subseting on time interval in data.table

我有一个 data.table 看起来像这样:

    id event state      time
 1:  A     0  NULL 0.8998250
 2:  A     1  NULL 1.1459127
 3:  A     0  NULL 1.1879722
 4:  A     2  NULL 1.5158930
 5:  A     0  NULL 2.4703966
 6:  B     0  NULL 0.8895393
 7:  B     1  NULL 1.5823427
 8:  B     2  NULL 2.2228495
 9:  B     0  NULL 3.2171193
10:  B     0  NULL 3.8728251
11:  C     1  NULL 0.7085305
12:  C     0  NULL 1.2525965
13:  C     2  NULL 1.8467385
14:  C     0  NULL 2.1358983
15:  C     0  NULL 2.2830119

我想为事件 1 和事件 2 之间的行赋予变量 state 值 1。这两个事件只发生一次,每个 idevent=1 总是来在 event=2.

之前

下面的代码生成上面的 data.table,

library(data.table)

# Defining variabels and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F))
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time) 
DT

并且我尝试了下面的代码将值1分配给event==1event==2两个时间点之间的状态变量。

DT[time>=time[event==1] & time<=time[event==2],state:="1",by=id]

但这会生成以下输出:

    id event state      time
 1:  A     0  NULL 0.8998250
 2:  A     1  NULL 1.1459127
 3:  A     0     1 1.1879722
 4:  A     2     1 1.5158930
 5:  A     0  NULL 2.4703966
 6:  B     0     1 0.8895393
 7:  B     1  NULL 1.5823427
 8:  B     2     1 2.2228495
 9:  B     0  NULL 3.2171193
10:  B     0  NULL 3.8728251
11:  C     1  NULL 0.7085305
12:  C     0     1 1.2525965
13:  C     2  NULL 1.8467385
14:  C     0     1 2.1358983
15:  C     0  NULL 2.2830119

state=1 明显放在 data.table 中的错误位置。我不知道 data.table 在做什么。你能看出为什么 data.table 会这样吗?我的问题是否有好的解决方案?

你快到了,试试这个:

DT[,state:= ifelse(time>=time[event==1] & time<=time[event==2],1,state),by=id]

#    id event state      time
# 1:  A     0  NULL 0.8998250
# 2:  A     1     1 1.1459127
# 3:  A     0     1 1.1879722
# 4:  A     2     1 1.5158930
# 5:  A     0  NULL 2.4703966
# 6:  B     0  NULL 0.8895393
# 7:  B     1     1 1.5823427
# 8:  B     2     1 2.2228495
# 9:  B     0  NULL 3.2171193
#10:  B     0  NULL 3.8728251
#11:  C     1     1 0.7085305
#12:  C     0     1 1.2525965
#13:  C     2     1 1.8467385
#14:  C     0  NULL 2.1358983
#15:  C     0  NULL 2.2830119

不使用 ifelse,我们可以使用 .I 提取行索引,然后将 state 的那些行分配给 '1'。

DT[DT[,.I[time>=time[event==1] & time<=time[event==2]], 
                                 by=id]$V1, state:='1'][]
#    id event state      time
# 1:  A     0  NULL 0.8998250
# 2:  A     1     1 1.1459127
# 3:  A     0     1 1.1879722
# 4:  A     2     1 1.5158930
# 5:  A     0  NULL 2.4703966
# 6:  B     0  NULL 0.8895393
# 7:  B     1     1 1.5823427
# 8:  B     2     1 2.2228495
# 9:  B     0  NULL 3.2171193
#10:  B     0  NULL 3.8728251
#11:  C     1     1 0.7085305
#12:  C     0     1 1.2525965
#13:  C     2     1 1.8467385
#14:  C     0  NULL 2.1358983
#15:  C     0  NULL 2.2830119