如何在 r 中的两个数据帧上找到最匹配的日期?

How to find the best matching dates on two dataframes in r?

我有两个数据帧 insitumodel:

dput(head(insitu,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR", 
"LAM", "LAM", "LAM", "LAM", "LAM", "LAM"), D_SOS = structure(c(16929, 
17149, 17422, 17850, 18389, 18202, 17044, 16744, 17300, 17522, 
18027, 18198), class = "Date"), D_EOS = structure(c(17067, 17353, 
17712, 18082, 18516, 18360, 17123, 17002, 17414, 17722, 18148, 
18446), class = "Date")), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

dput(head(model,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR", 
"AUR", "AUR", "LAM", "LAM", "LAM", "LAM", "LAM", "LAM", "LAM"
), EVI_SOS = structure(c(16934, 17137, 17378, 17605, 17862, 18003, 
18192, 18395, 16744, 17134, 17278, 17518, 17725, 18004, 18200
), class = "Date"), EVI_EOS = structure(c(17074, 17361, 17591, 
17798, 17994, 18096, 18376, 18594, 17106, 17252, 17431, 17705, 
17862, 18173, 18549), class = "Date")), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))

我想做的是: 我想找到所选列和相应行上两个数据框之间的最佳匹配日期。换句话说,在 D_SOS 列中的数据框 insitu 中截取列的 AURID 哪些日期最匹配列 EVI_SOS 截取行 AUR来自数据框 model 的列 ID。必须对 LAM 行进行相同的操作。

所需的输出示例为:

dput(head(output,20))
structure(list(ID = c("AUR", "AUR", "AUR", "AUR", "AUR", "AUR", 
"LAM", "LAM", "LAM", "LAM", "LAM", "LAM"), D_SOS = structure(c(16929, 
17149, 17422, 17850, 18389, 18202, 17044, 16744, 17300, 17522, 
18027, 18198), class = "Date"), EVI_SOS = structure(c(16934, 
17137, 17378, 17862, 18395, 18192, 17134, 16744, 17278, 17518, 
18004, 18200), class = "Date"), D_EOS = structure(c(17067, 17353, 
17712, 18082, 18516, 18360, 17123, 17002, 17414, 17722, 18148, 
18446), class = "Date"), EVI_EOS = structure(c(17074, 17361, 
17798, 18096, 18594, 18376, 17252, 17106, 17431, 17705, 18173, 
18549), class = "Date")), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

它看起来像这样:

   ID    D_SOS        EVI_SOS     D_EOS        EVI_EOS  
1  AUR   2016-05-08   2016-05-13  2016-09-23   2016-09-30 
2  AUR   2016-12-14   2016-12-02  2017-07-06   2017-07-14       
3  AUR   2017-09-13   2017-07-31  2018-06-30   2018-09-24
4  AUR   2018-11-15   2018-11-27  2019-07-05   2019-07-19
5  AUR   2020-05-17   2020-05-13  2020-09-11   2020-11-28
6  AUR   2019-11-02   2019-10-23  2020-04-08   2020-04-24
7  LAM   2016-08-31   2016-11-29  2016-11-18   2017-03-27
8  LAM   2015-11-05   2015-11-05  2016-07-20   2016-11-01
9  LAM   2017-05-14   2017-04-22  2017-09-05   2017-09-22
10 LAM   2017-12-22   2017-12-18  2018-07-10   2018-06-23
11 LAM   2019-05-11   2019-04-18  2019-09-09   2019-10-04
12 LAM   2019-10-29   2019-10-31  2020-07-03   2020-10-14

基本上,model 数据帧 6 的 8 个 AUR 日期将匹配,因为 insituAUR 中只有 6 个日期。对于 LAM model 数据框有 7 个日期,但 insituLAM 中有 6 个日期,这将是要匹配的数字。 输出将首先包含 insitu 的列,例如 D_SOS,然后是 model 的列,例如 EVI_SOS 与相应的日期匹配。

我们将不胜感激任何帮助。

library(data.table)

setDT(insitu)
setDT(model)

insitu[, key := D_SOS]
model[, key := EVI_SOS]

setkey(insitu, ID, key)
setkey(model, ID, key)

model[insitu, roll = "nearest"][, .(ID, D_SOS, EVI_SOS, D_EOS, EVI_EOS)]

#      ID      D_SOS    EVI_SOS      D_EOS    EVI_EOS
#  1: AUR 2016-05-08 2016-05-13 2016-09-23 2016-09-30
#  2: AUR 2016-12-14 2016-12-02 2017-07-06 2017-07-14
#  3: AUR 2017-09-13 2017-07-31 2018-06-30 2018-03-01
#  4: AUR 2018-11-15 2018-11-27 2019-07-05 2019-04-08
#  5: AUR 2019-11-02 2019-10-23 2020-04-08 2020-04-24
#  6: AUR 2020-05-07 2020-05-13 2020-09-11 2020-11-28
#  7: LAM 2015-11-05 2015-11-05 2016-07-20 2016-11-01
#  8: LAM 2016-08-31 2016-11-29 2016-11-18 2017-03-27
#  9: LAM 2017-05-14 2017-04-22 2017-09-05 2017-09-22
# 10: LAM 2017-12-22 2017-12-18 2018-07-10 2018-06-23
# 11: LAM 2019-05-11 2019-04-18 2019-09-09 2019-10-04
# 12: LAM 2019-10-29 2019-10-31 2020-07-03 2020-10-14