R:在 ifelse 函数中使用“()”内的逻辑语句在 data.table 中分配变量

R: Assigning variable in data.table using logical statement inside "()" in ifelse function

在问题 中,我请求帮助根据事件之间的时间码分配 "state" 变量,即 event==1event==2

该解决方案使用 ifelse 函数,其中逻辑测试检查时间变量是否在起点和终点的时间值之间。

问题是如果我想在 ifelse 函数中对逻辑语句进行分组。因此,首先评估和 OR 语句,然后评估 AND 语句。对于具体性,我有以下 data.table.

# Defining variables and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F))
event[event==2] <- sample(c(2,3),3,T)
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time) 
DT[14,] <- DT[13,]
DT[14,event:=3]

产生这个 data.table:

    id event state      time
 1:  A     0  NULL 0.3279207
 2:  A     1  NULL 1.2824244
 3:  A     0  NULL 2.1719637
 4:  A     3  NULL 2.8647671  <- Event 2 or 3 marks the end point
 5:  A     0  NULL 3.5052739
 6:  B     0  NULL 0.9942698
 7:  B     1  NULL 1.6499756
 8:  B     2  NULL 2.3585060  <- Event 2 or 3 marks the end point
 9:  B     0  NULL 2.9025721
10:  B     0  NULL 3.4967141
11:  C     1  NULL 0.2891597
12:  C     0  NULL 0.4362734
13:  C     2  NULL 1.3992976  <- Here both 2 and 3 appear at the same endpoint 
14:  C     3  NULL 1.3992976  <- Here both 2 and 3 appear at the same endpoint 
15:  C     0  NULL 2.9923019

我想为开始事件 (event==1) 和结束点 (event==2 OR event==3 OR BOTH) 之间的所有观察值将值 1 分配给状态变量。所以正确的结果是这样的:

    id event state      time
 1:  A     0  NULL 0.3279207
 2:  A     1     1 1.2824244
 3:  A     0     1 2.1719637
 4:  A     3     1 2.8647671
 5:  A     0  NULL 3.5052739
 6:  B     0  NULL 0.9942698
 7:  B     1     1 1.6499756
 8:  B     2     1 2.3585060
 9:  B     0  NULL 2.9025721
10:  B     0  NULL 3.4967141
11:  C     1     1 0.2891597
12:  C     0     1 0.4362734
13:  C     2     1 1.3992976
14:  C     3     1 1.3992976
15:  C     0  NULL 2.9923019

我的第一次尝试是这段代码:

DT[,state:=ifelse(time>=time[event==1] & (time<=time[event==2] | time<=time[event==3]),1,state),by=id]

给出以下错误信息:

Error in `[.data.table`(DT, , `:=`(state, ifelse(time >= time[event ==  : 
Type of RHS ('logical') must match LHS ('character'). To check and coerce would 
impact performance too much for the fastest cases. Either change the type of the target 
column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

这行代码产生了正确的结果,

DT[,state:=ifelse(time>=time[event==1] & time<=time[event==2 | event==3],1,state),by=id]

但是当逻辑语句 time<=time[event==2 | event==3] 的长度大于 1 时它会发出警告。所以这不是一个优雅的解决方案,因为它看起来像一个错误。

如果时间在起点和终点之间,我如何将值 1 分配给状态变量,终点是由 OR 语句定义的,就像我第一次尝试的那样。

非常感谢。

您可以通过定义两个新列来解决它。

DT[, segment := cumsum(event == 1)]
DT[, keep := cumsum(c(1, event[-.N]) %in% c(2, 3)) < 1, by = segment]
DT[segment == 0, keep := FALSE]
DT[keep == TRUE, state := 1]
DT[, segment := NULL]
DT[, keep := NULL]

我不太精通data.table,所以可能会有更好的方法。

DT[, rows:=1:.N , by=id][
   , state:=ifelse(rows >= which(event==1) & rows <= max(which(event==2), which(event==3)), 1, state), by=id]
DT
    id event state      time rows
 1:  A     0  NULL 0.3279207    1
 2:  A     1     1 1.2824244    2
 3:  A     0     1 2.1719637    3
 4:  A     3     1 2.8647671    4
 5:  A     0  NULL 3.5052739    5
 6:  B     0  NULL 0.9942698    1
 7:  B     1     1 1.6499756    2
 8:  B     2     1 2.3585060    3
 9:  B     0  NULL 2.9025721    4
10:  B     0  NULL 3.4967141    5
11:  C     1     1 0.2891597    1
12:  C     0     1 0.4362734    2
13:  C     2     1 1.3992976    3
14:  C     3     1 1.3992976    4
15:  C     0  NULL 2.9923019    5

您第一次尝试失败的原因是 time[event==2]time[event==3] 计算结果为 numeric(0),而实际上只有其中一个事件发生。

DT[id=='A', time[event==2]]
## numeric(0)

解决此问题的最简单方法是采取例如两次中的最大值:time <= max(time[event %in% 2:3])

DT[, state := ifelse(time >= time[event==1] & time <= max(time[event %in% 2:3]), 1, state), by=id]
DT
##     id event state      time
##  1:  A     0  NULL 0.3279207
##  2:  A     1     1 1.2824244
##  3:  A     0     1 2.1719637
##  4:  A     3     1 2.8647671
##  5:  A     0  NULL 3.5052739
##  6:  B     0  NULL 0.9942698
##  7:  B     1     1 1.6499756
##  8:  B     2     1 2.3585060
##  9:  B     0  NULL 2.9025721
## 10:  B     0  NULL 3.4967141
## 11:  C     1     1 0.2891597
## 12:  C     0     1 0.4362734
## 13:  C     2     1 1.3992976
## 14:  C     3     1 1.3992976
## 15:  C     0  NULL 2.9923019