如何在 R 中减去两个 DATES 变量,结果应该以天为单位

How to substract two DATES variables in R and outcome should be in days

我的数据框中有以下两列,名为 Entry_dateDeath_date,包含格式为 YYYY/MM/DD 的日期。我想减去 (Death_date-Entry_date = survival_days)。从 Entry_date 中减去 Death_date 后,我想要几天后的结果。我的数据如下所示。

Sample_ID<-c("a1","a2","a3","a4","a5","a6")
Entry_date<-c(2010/04/13, 2008/07/30, 2009/03/06, 2008/08/22, 2009/06/24, 2008/08/26)
Death_date<-c(2007/05/17, 2007/05/16, 2007/05/16, 2007/05/16,2007/05/16, 2010/05/16)
Df<-data.frame(Sample_ID,Entry_date,Death_date)

我想要一个名为 Df$survival_days 的列作为结果变量,如下所示

Sample_ID  Entry_date       Death_date      Df$survival_days   
                                                -1062.00
                                                -441.00
                                                -660.00
                                                -464.00
                                                -770.00
                                                 468.00

我如何在 R 中执行此操作。我的 cox 需要这个变量。回归生存分析。我的真实数据框有大约 10,000 个观察值。

您可以使用单位为“天”的 difftime()

使用带有适当单位的 difftime 并以字符串形式提供日期:

Sample_ID<-c("a1","a2","a3","a4","a5","a6")
Entry_date<-c("2010/04/13", "2008/07/30", "2009/03/06", "2008/08/22", "2009/06/24", "2008/08/26")
Death_date<-c("2007/05/17", "2007/05/16", "2007/05/16", "2007/05/16","2007/05/16", "2010/05/16")
Df<-data.frame(Sample_ID,Entry_date,Death_date)
Df$difference_in_days <- difftime(Df$Death_date, Df$Entry_date, units = "days")

输出

> Df
  Sample_ID Entry_date Death_date difference_in_days
1        a1 2010/04/13 2007/05/17    -1062.0000 days
2        a2 2008/07/30 2007/05/16     -441.0000 days
3        a3 2009/03/06 2007/05/16     -660.0417 days
4        a4 2008/08/22 2007/05/16     -464.0000 days
5        a5 2009/06/24 2007/05/16     -770.0000 days
6        a6 2008/08/26 2010/05/16      628.0000 days

您可以使用 lubridatedplyr。但首先:我更改了您的输入数据:

Sample_ID  <- c("a1","a2","a3","a4","a5","a6")
Entry_date <- c("2010/04/13", "2008/07/30", "2009/03/06", "2008/08/22", "2009/06/24", "2008/08/26")
Death_date <- c("2007/05/17", "2007/05/16", "2007/05/16", "2007/05/16","2007/05/16", "2010/05/16")

Df <- data.frame(Sample_ID,Entry_date=ymd(Entry_date),Death_date=ymd(Death_date), stringsAsFactors = FALSE)

有了这个数据

Df %>%
  mutate(survival_days=Death_date - Entry_date)

产量

  Sample_ID Entry_date Death_date survival_days
1        a1 2010-04-13 2007-05-17    -1062 days
2        a2 2008-07-30 2007-05-16     -441 days
3        a3 2009-03-06 2007-05-16     -660 days
4        a4 2008-08-22 2007-05-16     -464 days
5        a5 2009-06-24 2007-05-16     -770 days
6        a6 2008-08-26 2010-05-16      628 days