两个日期段的部分或全部重叠

Partial or Total Overlap of two Date periods

我想知道在特定的时间段内哪些观察是活跃的

我真正的 objective 是了解患者在怀孕期间(9 个月)期间处于活动状态的诊断 (dx)。必须考虑到患者一生中可能有不同的怀孕次数,也可能有不同的诊断次数(dx 可以或不能激活)。

我已经尝试 foverlaps like here or but weren't exactly what I was looking for. The real problem is this 并且它有很好的文档记录但不适用于 R。

Here 我认为他们让它适用于 SQL。所以我希望它是可以解决的...

我也试过 this 之类的 Non-Equi Join,但我无法让它按照我想要的方式工作...

让我们开始吧:

我有一个数据库,其中包含患者 (id) 诊断 (dx)、注册时间 (InD_dx) 和结束时间 (EndD_dx),如下所示:

id <- rep("a", 11)
InD_dx <- as.Date(c("2005-10-04","2005-10-06","2005-10-06", "2008-04-07", "2010-05-10", "2012-04-24", "2012-04-24", "2012-05-15", "2014-03-20", "2014-04-22", "2017-11-30"), format = "%Y-%m-%d")
EndD_dx <- as.Date(c("2020-12-31","2020-12-31","2020-12-31", "2008-11-05", "2011-01-17", "2015-07-20", "2013-01-01", "2015-07-20", "2015-12-04", "2020-12-31", "2020-12-31"), format = "%Y-%m-%d")
dx <- c("A", "B", "C", "P", "P", "D", "P", "E", "F", "G", "H")

DT = data.table(id,InD_dx,EndD_dx, dx)

    DT
    id   InD_dx    EndD_dx   dx
 1:  a 2005-10-04 2020-12-31  A
 2:  a 2005-10-06 2020-12-31  B
 3:  a 2005-10-06 2020-12-31  C
 4:  a 2008-04-07 2008-11-05  P
 5:  a 2010-05-10 2011-01-17  P
 6:  a 2012-04-24 2015-07-20  D
 7:  a 2012-04-24 2013-01-01  P
 8:  a 2012-05-15 2015-07-20  E
 9:  a 2014-03-20 2015-12-04  F
10:  a 2014-04-22 2020-12-31  G
11:  a 2017-11-30 2020-12-31  H

我选了怀孕的dx(P),分开做了一个DB。还有一个 dx 不同于 P 的数据库,如下所示:

Pregnancies <- copy(DT[dx== "P"])
Pregnancies 

    id InD_dx     EndD_dx   dx
 1: a 2008-04-07 2008-11-05 P
 2: a 2010-05-10 2011-01-17 P
 3: a 2012-04-24 2013-01-01 P

Dx_Other_than_Pregnancies <- copy(DT[dx!= "P"])
Dx_Other_than_Pregnancies

   id InD_dx     EndD_dx    dx
 1: a 2005-10-04 2020-12-31 A
 2: a 2005-10-06 2020-12-31 B
 3: a 2005-10-06 2020-12-31 C
 4: a 2012-04-24 2015-07-20 D
 5: a 2012-05-15 2015-07-20 E
 6: a 2014-03-20 2015-12-04 F
 7: a 2014-04-22 2020-12-31 G
 8: a 2017-11-30 2020-12-31 H

我期待将它们合并并通过怀孕对主动诊断进行分组。这就是为什么我尝试使用 foverlap:

setkey(Dx_Other_than_Pregnancies, id, InD_dx, EndD_dx)

foverlaps(Pregnancies, Dx_Other_than_Pregnancies,
          by.x=c('id', 'InD_dx', 'EndD_dx'), type='within', mult= "all", nomatch = 0L)

和非相等连接:

   Dx_Other_than_Pregnancies[Pregnancies, on=.(id, dx_Ini<=dxIni , EndD_dx>=EndD_dx)]

最后都像:

     id     InD_dx    EndD_dx dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P

在 Non Equi Joins i.InD_dx 和 i.EndD_dx 的情况下,它不会输出并且 EndD_dx 变为 i.EndD_dx

期望的结果

    id   InD_dx    EndD_dx   dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P

我不知道我是不是把事情复杂化了。可能通过加入 1:multiple 并在日期之间做出差异我会得到我想要的。这样够高效了吧?

如有任何帮助,我们将不胜感激

提前致谢!

type = 'within' 排除了您正在寻找的一些部分重叠。

尝试:

DT = data.table(id,InD_dx,EndD_dx, dx)

setkey(DT,id,InD_dx,EndD_dx)

foverlaps(DT[dx=='P'],DT[dx!='P'],
          by.x = c("id", "InD_dx", "EndD_dx"),
          by.y = c("id", "InD_dx", "EndD_dx"))

   id     InD_dx    EndD_dx dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P