为同一 id 取两个数据框之间的最小日期差异

Taking minimum difference of dates between two data frame for the same id

我的问题很简单。我有 2 个数据框,每个数据框都有一列日期 (%Y-%m-%d) 和一列 ID。一个每行只有一个 ID,另一个具有相同 ID 的多行。我想获取该值,以便它显示日期的最小差异。现在我用一个例子更好地解释:

df1(colA 的单个值):

+-------+------------+------+------+-------+-------+
| colA  |    colB    | colC | colD | colE  | colF  |
+-------+------------+------+------+-------+-------+
| 3000  | 2011-01-20 |    2 | 3.43 | 2.01  | 1.63  |
| 3001  | 2012-04-06 |    1 | 1.12 | -0.63 | -1.16 |
| 3002  | 2012-04-24 |    2 | 2.28 | -0.18 | -0.12 |
| 3003  | 2012-04-13 |    2 | 1.27 | -0.51 | -0.82 |
| 3004  | 2011-08-24 |    5 | 5.30 | 2.68  | 2.10  |
| 3006  | 2011-08-02 |    2 | 2.12 | -0.27 | -2.60 |
+-------+------------+------+------+-------+-------+

df2(第一列 (X) 的多个值):

+------+---------------+----------+
| colX |     colY      | colZ     |
+------+---------------+----------+
| 3000 | 2011-02-01    |        0 |
| 3000 | 2012-03-01    |        0 |
| 3000 | 2013-02-01    |        0 |
| 3000 | 2014-03-01    |        1 |
| 3000 | 2015-03-01    |        0 |
| 3000 | 2016-04-01    |        0 |
| 3002 | 2011-03-01    |        1 |
| 3002 | 2011-08-01    |        1 |
| 3002 | 2012-04-01    |        0 |
+------+---------------+----------+

在这种情况下,我看到 colA (df1) 中的第一个值,并计算 2011-01-20 与 df2 中 3000 的所有日期(2011-02-01、2012-03)之间的所有月份差异-01,ecc),所以前 6 行。我只取最小差值,因此在本例中是第一个差值 (2011-02-01),差值将近一个月。所以最后我应该让 df1 有 3 个新列(Y 和 Z 和差异)所以 df2 上的最小日期,Z 的 0/1 和 2 日期的天数差异。

例如3000(差价我取腹肌):

3000  2011-01-20  2  3.43  2.01  1.63  2011-02-01 0 12

我应该使用什么功能?申请? ddply?

提前致谢

您可以试试这个(请注意您如何定义日期操作,因为这在您的问题中并不清楚):

library(tidyverse)
library(lubridate)

#Data
df1 <- structure(list(colA = c(3000L, 3001L, 3002L, 3003L, 3004L, 3006L
), colB = c("2011-01-20", "2012-04-06", "2012-04-24", "2012-04-13", 
"2011-08-24", "2011-08-02"), colC = c(2L, 1L, 2L, 2L, 5L, 2L), 
    colD = c(3.43, 1.12, 2.28, 1.27, 5.3, 2.12), colE = c(2.01, 
    -0.63, -0.18, -0.51, 2.68, -0.27), colF = c(1.63, -1.16, 
    -0.12, -0.82, 2.1, -2.6)), class = "data.frame", row.names = c(NA, 
-6L))
df2 <- structure(list(colX = c(3000L, 3000L, 3000L, 3000L, 3000L, 3000L, 
3002L, 3002L, 3002L), colY = c("2011-02-01", "2012-03-01", "2013-02-01", 
"2014-03-01", "2015-03-01", "2016-04-01", "2011-03-01", "2011-08-01", 
"2012-04-01"), colZ = c(0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-9L))

#Code
#Compute
dfo <- df2 %>% rename(colA=colX) %>% left_join(df1) %>% 
  mutate(Diff=abs(12*(year(as.Date(colB))-year(as.Date(colY)))+month(as.Date(colB))-month(as.Date(colY))),
         Diffdays=abs(as.Date(colB)-as.Date(colY))) %>% group_by(colA) %>%
  filter(Diff==min(Diff))
#Format
vars <- c(names(df1),names(df2)[-1],'Diff','Diffdays')
#Data
dfo %>% select(vars)

# A tibble: 2 x 10
# Groups:   colA [2]
   colA colB        colC  colD  colE  colF colY        colZ  Diff Diffdays
  <int> <chr>      <int> <dbl> <dbl> <dbl> <chr>      <int> <dbl> <drtn>  
1  3000 2011-01-20     2  3.43  2.01  1.63 2011-02-01     0     1 12 days 
2  3002 2012-04-24     2  2.28 -0.18 -0.12 2012-04-01     0     0 23 days 

请检查这是否符合您的要求。