通过 data.table 中时间间隔的逻辑子集定义变量
Defining variable by logical subseting on time interval in data.table
我有一个 data.table
看起来像这样:
id event state time
1: A 0 NULL 0.8998250
2: A 1 NULL 1.1459127
3: A 0 NULL 1.1879722
4: A 2 NULL 1.5158930
5: A 0 NULL 2.4703966
6: B 0 NULL 0.8895393
7: B 1 NULL 1.5823427
8: B 2 NULL 2.2228495
9: B 0 NULL 3.2171193
10: B 0 NULL 3.8728251
11: C 1 NULL 0.7085305
12: C 0 NULL 1.2525965
13: C 2 NULL 1.8467385
14: C 0 NULL 2.1358983
15: C 0 NULL 2.2830119
我想为事件 1 和事件 2 之间的行赋予变量 state
值 1。这两个事件只发生一次,每个 id
和 event=1
总是来在 event=2
.
之前
下面的代码生成上面的 data.table,
library(data.table)
# Defining variabels and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F))
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time)
DT
并且我尝试了下面的代码将值1分配给event==1
和event==2
两个时间点之间的状态变量。
DT[time>=time[event==1] & time<=time[event==2],state:="1",by=id]
但这会生成以下输出:
id event state time
1: A 0 NULL 0.8998250
2: A 1 NULL 1.1459127
3: A 0 1 1.1879722
4: A 2 1 1.5158930
5: A 0 NULL 2.4703966
6: B 0 1 0.8895393
7: B 1 NULL 1.5823427
8: B 2 1 2.2228495
9: B 0 NULL 3.2171193
10: B 0 NULL 3.8728251
11: C 1 NULL 0.7085305
12: C 0 1 1.2525965
13: C 2 NULL 1.8467385
14: C 0 1 2.1358983
15: C 0 NULL 2.2830119
state=1
明显放在 data.table 中的错误位置。我不知道 data.table 在做什么。你能看出为什么 data.table 会这样吗?我的问题是否有好的解决方案?
你快到了,试试这个:
DT[,state:= ifelse(time>=time[event==1] & time<=time[event==2],1,state),by=id]
# id event state time
# 1: A 0 NULL 0.8998250
# 2: A 1 1 1.1459127
# 3: A 0 1 1.1879722
# 4: A 2 1 1.5158930
# 5: A 0 NULL 2.4703966
# 6: B 0 NULL 0.8895393
# 7: B 1 1 1.5823427
# 8: B 2 1 2.2228495
# 9: B 0 NULL 3.2171193
#10: B 0 NULL 3.8728251
#11: C 1 1 0.7085305
#12: C 0 1 1.2525965
#13: C 2 1 1.8467385
#14: C 0 NULL 2.1358983
#15: C 0 NULL 2.2830119
不使用 ifelse
,我们可以使用 .I
提取行索引,然后将 state
的那些行分配给 '1'。
DT[DT[,.I[time>=time[event==1] & time<=time[event==2]],
by=id]$V1, state:='1'][]
# id event state time
# 1: A 0 NULL 0.8998250
# 2: A 1 1 1.1459127
# 3: A 0 1 1.1879722
# 4: A 2 1 1.5158930
# 5: A 0 NULL 2.4703966
# 6: B 0 NULL 0.8895393
# 7: B 1 1 1.5823427
# 8: B 2 1 2.2228495
# 9: B 0 NULL 3.2171193
#10: B 0 NULL 3.8728251
#11: C 1 1 0.7085305
#12: C 0 1 1.2525965
#13: C 2 1 1.8467385
#14: C 0 NULL 2.1358983
#15: C 0 NULL 2.2830119
我有一个 data.table
看起来像这样:
id event state time
1: A 0 NULL 0.8998250
2: A 1 NULL 1.1459127
3: A 0 NULL 1.1879722
4: A 2 NULL 1.5158930
5: A 0 NULL 2.4703966
6: B 0 NULL 0.8895393
7: B 1 NULL 1.5823427
8: B 2 NULL 2.2228495
9: B 0 NULL 3.2171193
10: B 0 NULL 3.8728251
11: C 1 NULL 0.7085305
12: C 0 NULL 1.2525965
13: C 2 NULL 1.8467385
14: C 0 NULL 2.1358983
15: C 0 NULL 2.2830119
我想为事件 1 和事件 2 之间的行赋予变量 state
值 1。这两个事件只发生一次,每个 id
和 event=1
总是来在 event=2
.
下面的代码生成上面的 data.table,
library(data.table)
# Defining variabels and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F))
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time)
DT
并且我尝试了下面的代码将值1分配给event==1
和event==2
两个时间点之间的状态变量。
DT[time>=time[event==1] & time<=time[event==2],state:="1",by=id]
但这会生成以下输出:
id event state time
1: A 0 NULL 0.8998250
2: A 1 NULL 1.1459127
3: A 0 1 1.1879722
4: A 2 1 1.5158930
5: A 0 NULL 2.4703966
6: B 0 1 0.8895393
7: B 1 NULL 1.5823427
8: B 2 1 2.2228495
9: B 0 NULL 3.2171193
10: B 0 NULL 3.8728251
11: C 1 NULL 0.7085305
12: C 0 1 1.2525965
13: C 2 NULL 1.8467385
14: C 0 1 2.1358983
15: C 0 NULL 2.2830119
state=1
明显放在 data.table 中的错误位置。我不知道 data.table 在做什么。你能看出为什么 data.table 会这样吗?我的问题是否有好的解决方案?
你快到了,试试这个:
DT[,state:= ifelse(time>=time[event==1] & time<=time[event==2],1,state),by=id]
# id event state time
# 1: A 0 NULL 0.8998250
# 2: A 1 1 1.1459127
# 3: A 0 1 1.1879722
# 4: A 2 1 1.5158930
# 5: A 0 NULL 2.4703966
# 6: B 0 NULL 0.8895393
# 7: B 1 1 1.5823427
# 8: B 2 1 2.2228495
# 9: B 0 NULL 3.2171193
#10: B 0 NULL 3.8728251
#11: C 1 1 0.7085305
#12: C 0 1 1.2525965
#13: C 2 1 1.8467385
#14: C 0 NULL 2.1358983
#15: C 0 NULL 2.2830119
不使用 ifelse
,我们可以使用 .I
提取行索引,然后将 state
的那些行分配给 '1'。
DT[DT[,.I[time>=time[event==1] & time<=time[event==2]],
by=id]$V1, state:='1'][]
# id event state time
# 1: A 0 NULL 0.8998250
# 2: A 1 1 1.1459127
# 3: A 0 1 1.1879722
# 4: A 2 1 1.5158930
# 5: A 0 NULL 2.4703966
# 6: B 0 NULL 0.8895393
# 7: B 1 1 1.5823427
# 8: B 2 1 2.2228495
# 9: B 0 NULL 3.2171193
#10: B 0 NULL 3.8728251
#11: C 1 1 0.7085305
#12: C 0 1 1.2525965
#13: C 2 1 1.8467385
#14: C 0 NULL 2.1358983
#15: C 0 NULL 2.2830119