如何知道数据是否在使用 r 的生存分析中被审查

How to know if the data is censored in a survival analysis using r

我有一个看起来像这样的数据集(一个无意义的例子):

id <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
year <- c(1990, 1991, 1992, 1989, 1990, 1991, 1992, 1993, 1989, 1990, 1992, 1993)
event<- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1) 

df <- cbind(id, year, event)

假设从 1989 年到死亡期间对所有三个 ID 进行连续观察。但是,如您所见,id 1 是左删失的(从开始没有信息),id 2 是右删失的(没有从开始或结束的信息),id 3 在观察上有差距(从开始和结束的信息但有差距).在小的table中这很容易看出,但在处理大数据集时就变得困难了。

编辑: 有没有一种方法可以按 id 分组并创建一个摘要 table,其中包含有关数据完整性的信息,例如:

id   left-censored   right-censored   gaps in obs. 
1    1               0                0             
2    0               1                0
3    0               0                1

你可以按ID分组(我使用dplyr)你的data.frame(我使用tibble)然后创建新的指示每个 ID 的第一年观察是否为 1989 年、此人是否在观察期间死亡以及每个 ID 的行数是否等于时间跨度的变量(max_year - min_year + 1).在这种情况下,我认为 ID 2 没有被删失,因为她观察的第一年是 1989 年,您将其定义为起始年。

library(tibble)
library(dplyr)


id <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
year <- c(1990, 1991, 1992, 1989, 1990, 1991, 1992, 1993, 1989, 1990, 1992, 1993)
deceased <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1) 

df <- tibble(id, year, deceased)

start_year <- 1989

df %>% group_by(id) %>% mutate(left_censored = min(year) > start_year,  ## left censored, if first year is after 1988
                               right_censored = max(deceased) == 0, ## right censored, if did not die within observation 
                               has_gaps = n() < max(year) - min(year) + 1) ## has gaps, 

结果:

# A tibble: 12 x 6
# Groups:   id [3]
      id  year deceased left_censored right_censored has_gaps
   <dbl> <dbl>    <dbl> <lgl>         <lgl>          <lgl>   
 1     1  1990        0 TRUE          FALSE          FALSE   
 2     1  1991        0 TRUE          FALSE          FALSE   
 3     1  1992        1 TRUE          FALSE          FALSE   
 4     2  1989        0 FALSE         TRUE           FALSE   
 5     2  1990        0 FALSE         TRUE           FALSE   
 6     2  1991        0 FALSE         TRUE           FALSE   
 7     2  1992        0 FALSE         TRUE           FALSE   
 8     2  1993        0 FALSE         TRUE           FALSE   
 9     3  1989        0 FALSE         FALSE          TRUE    
10     3  1990        0 FALSE         FALSE          TRUE    
11     3  1992        0 FALSE         FALSE          TRUE    
12     3  1993        1 FALSE         FALSE          TRUE 

编辑: 如果你想要一个概述,你可以添加:

df %>% group_by(id) %>% mutate(left_censored = min(year) > start_year,  ## left censored, if first year is after 1988
                                   right_censored = max(deceased) == 0, ## right censored, if did not die within observation 
                                   has_gaps = n() < max(year) - min(year) + 1) %>%## has gaps, 
      dplyr::distinct(id, left_censored, right_censored, has_gaps) %>%
      ungroup() %>%
      summarise(left_censored = sum(left_censored), right_censored = sum(right_censored), has_gaps = sum(has_gaps))

并得到:

# A tibble: 1 x 3
  left_censored right_censored has_gaps
          <int>          <int>    <int>
1             1              1        1

正如我之前提到的:这里的 ID 2 不被视为左删失,因为她的开始日期是 1989 年。

Edit2:如果你拿走 ungroup() 你会得到你要求的概述:

df %>% group_by(id) %>% mutate(left_censored = min(year) > start_year,  ## left censored, if first year is after 1988
                               right_censored = max(deceased) == 0, ## right censored, if did not die within observation 
                               has_gaps = n() < max(year) - min(year) + 1) %>%## has gaps, 
  dplyr::distinct(id, left_censored, right_censored, has_gaps) %>%
  summarise(left_censored = sum(left_censored), right_censored = sum(right_censored), has_gaps = sum(has_gaps))

并得到:

  id left_censored right_censored has_gaps
  <dbl>         <int>          <int>    <int>
1     1             1              0        0
2     2             0              1        0
3     3             0              0        1