在纵向数据集中发生事件后删除后续观测值
Dropping subsequent observations after the occurrence of an event in a longitudinal data set
我得到了纵向数据集,我想在感兴趣的事件发生后放弃观察。这意味着我想在指示感兴趣的 event
的虚拟变量发生后(即 event == 1
)删除所有观察结果。数据看起来像这样:
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,0)
df <- cbind(id,time,event)
id time event
[1,] 1 1 0
[2,] 1 2 1
[3,] 1 3 0
[4,] 2 1 1
[5,] 2 2 0
[6,] 2 3 0
[7,] 3 1 0
[8,] 3 2 0
[9,] 3 3 0
[10,] 4 1 0
[11,] 4 2 1
[12,] 4 3 0
[13,] 5 1 1
[14,] 5 2 0
[15,] 5 3 0
我想在事件发生后删除所有后续观察(为清楚起见:event == 1
)id
。导致数据集如下所示:
id time event
[1,] 1 1 0
[2,] 1 2 1
[3,] 2 1 1
[4,] 3 1 0
[5,] 3 2 0
[6,] 3 3 0
[7,] 4 1 0
[8,] 4 2 1
[9,] 5 1 1
我最大的问题是如何对时间变量上后续观察的移除进行调节。
提前致谢! :D
如果数据构造为data.frame
,那么我们可以在dplyr
中使用group by操作,即按'id'分组,得到1第一次出现的位置索引如果有,则获取序列,否则 return 行序列
library(dplyr)
df %>%
arrange(id, time) %>%
group_by(id) %>%
slice(if(1 %in% event) seq(match(1, event)) else row_number()) %>%
ungroup
-输出
# A tibble: 9 x 3
# id time event
# <dbl> <dbl> <dbl>
#1 1 1 0
#2 1 2 1
#3 2 1 1
#4 3 1 0
#5 3 2 0
#6 3 3 0
#7 4 1 0
#8 4 2 1
#9 5 1 1
或者如果我们将 nomatch
指定为行数 (n()
)
,则可以在没有 if/else
的情况下使其更短
df %>%
arrange(id, time) %>%
group_by(id) %>%
slice(seq(match(1, event, nomatch = n())))
数据
df <- data.frame(id, time, event)
seq_len()
的解决方案
library(dplyr)
df %>%
arrange(id, time) %>%
group_by(id) %>%
slice(seq_len(min(which(event == 1), n())))
数据
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,0)
df <- data.frame(id,time,event)
# output:
# Groups: id [5]
id time event
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 1
3 2 1 1
4 3 1 0
5 3 2 0
6 3 3 0
7 4 1 0
8 4 2 1
9 5 1 1
我得到了纵向数据集,我想在感兴趣的事件发生后放弃观察。这意味着我想在指示感兴趣的 event
的虚拟变量发生后(即 event == 1
)删除所有观察结果。数据看起来像这样:
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,0)
df <- cbind(id,time,event)
id time event
[1,] 1 1 0
[2,] 1 2 1
[3,] 1 3 0
[4,] 2 1 1
[5,] 2 2 0
[6,] 2 3 0
[7,] 3 1 0
[8,] 3 2 0
[9,] 3 3 0
[10,] 4 1 0
[11,] 4 2 1
[12,] 4 3 0
[13,] 5 1 1
[14,] 5 2 0
[15,] 5 3 0
我想在事件发生后删除所有后续观察(为清楚起见:event == 1
)id
。导致数据集如下所示:
id time event
[1,] 1 1 0
[2,] 1 2 1
[3,] 2 1 1
[4,] 3 1 0
[5,] 3 2 0
[6,] 3 3 0
[7,] 4 1 0
[8,] 4 2 1
[9,] 5 1 1
我最大的问题是如何对时间变量上后续观察的移除进行调节。
提前致谢! :D
如果数据构造为data.frame
,那么我们可以在dplyr
中使用group by操作,即按'id'分组,得到1第一次出现的位置索引如果有,则获取序列,否则 return 行序列
library(dplyr)
df %>%
arrange(id, time) %>%
group_by(id) %>%
slice(if(1 %in% event) seq(match(1, event)) else row_number()) %>%
ungroup
-输出
# A tibble: 9 x 3
# id time event
# <dbl> <dbl> <dbl>
#1 1 1 0
#2 1 2 1
#3 2 1 1
#4 3 1 0
#5 3 2 0
#6 3 3 0
#7 4 1 0
#8 4 2 1
#9 5 1 1
或者如果我们将 nomatch
指定为行数 (n()
)
if/else
的情况下使其更短
df %>%
arrange(id, time) %>%
group_by(id) %>%
slice(seq(match(1, event, nomatch = n())))
数据
df <- data.frame(id, time, event)
seq_len()
library(dplyr)
df %>%
arrange(id, time) %>%
group_by(id) %>%
slice(seq_len(min(which(event == 1), n())))
数据
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
time <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
event <- c(0,1,0,1,0,0,0,0,0,0,1,0,1,0,0)
df <- data.frame(id,time,event)
# output:
# Groups: id [5]
id time event
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 1
3 2 1 1
4 3 1 0
5 3 2 0
6 3 3 0
7 4 1 0
8 4 2 1
9 5 1 1