R:基于 "OR" 语句的连接

R: Joins based on "OR" Statements

我正在使用 R 编程语言。假设我有以下两个表:

table_1 = data.frame(id = c("123", "123", "125", "125"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))

table_1$id = as.factor(table_1$id)
table_1$id2 = as.factor(table_1$id2)
table_1$date_1 = as.factor(table_1$date_1)

table_2 = data.frame(id = c("123", "123", "125", "125"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))


table_2$id = as.factor(table_2$id)
table_2$id2 = as.factor(table_2$id2)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)

如果满足以下两个条件中的任何一个(即 if Condition_1 = 真或 Condition_2 = 真,然后“加入”)

Condition_1

  1. if table_1$id = table_2$id

  1. if table_1$date BETWEEN(table_2$date_2,table_2$date_3)

Condition_2

  1. if table_1$id2 = table_2$id2

  1. if table_1$date BETWEEN(table_2$date_2,table_2$date_3)

我已经尝试过: 我知道如何单独执行这两个连接,例如:

library(sqldf)

#Condition_1


final = sqldf("select a.*, b.*
           
       from table_1 a left join table_2 b
       on a.id = b.id and 
          a.date_1 between 
              b.date_2 and
              b.date_3") 


#Condition_2


final_2 = sqldf("select a.*, b.*
           
       from table_1 a left join table_2 b
       on a.id2 = b.id2 and 
          a.date_1 between 
              b.date_2 and
              b.date_3") 

然后我可以将这些文件(“final_3”)“绑定”在一起(并删除完全重复的行):

final_3 = rbind(final, final_2)

final_3 = final_3[!duplicated(final_3[c(1,2,3,4,5,6,7)]),]

我的问题: 有没有办法在一个步骤中将这两个表合并在一起,而不是两个单独的步骤?这可以使用 Base R 或 DPLYR 完成吗?

谢谢!

这可以在单个 SQL 语句中完成,如下所示。

library(sqldf)

sqldf("select distinct * 
  from table_1 a left join table_2 b
  on (a.date_1 between b.date_2 and b.date_3) and 
     (a.id = b.id or a.id2 = b.id2)")