如何在 R 中连接 2 个具有最大匹配字符串的表?

How to join 2 tables with maximum matched string in R?

我想加入 2 table1 & table 2(在 COlB 上左侧,在 ColD 上右侧)具有最大匹配字符串

Table 1

ColA ColB
123 C/O room Hanbur court vaux road
456 House Malveri business park

Table 2

ColD ColC
Hanbur Court Lightroom
Malveri park Office

输出Table

ColA ColB Colc
123 C/O room Hanbur court vaux road Lightroom
456 House Malveri business park Office

使用fuzzyjoin,可以根据距离选择加入

library(fuzzyjoin)
library(dplyr)
stringdist_inner_join(df1, df2, by = c(ColB = "ColD"),  
     max_dist = 0.5, method = "jaccard") %>%
    select(-ColD)
  ColA                            ColB      ColC
1  123 C/O room Hanbur court vaux road Lightroom
2  456     House Malveri business park    Office

数据

df1 <- structure(list(ColA = c(123L, 456L),
 ColB = c("C/O room Hanbur court vaux road", 
"House Malveri business park")), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(ColD = c("Hanbur Court", "Malveri park"),
 ColC = c("Lightroom", 
"Office")), class = "data.frame", row.names = c(NA, -2L))

这个相当复杂,但它完成了工作:

library(dplyr)
library(stringr)
library(tidyr)

# prepare df2 to get pattern for `str_detect` later
df2_new <- df2 %>% 
  separate_rows(ColD, sep = " ") %>% 
  mutate(helper = tolower(ColD)) 

# create pattern to match 
pattern <- paste(df2_new$helper, collapse = "|")

# do the calculations
df %>% 
  separate_rows(ColB, sep = " ") %>% 
  mutate(helper = tolower(ColB),
         helper1 = ifelse(str_detect(helper, pattern), 1, 0)) %>% 
  group_by(ColA) %>% 
  mutate(helper = paste(helper[helper1==1], collapse = " "),
         ColB = paste(ColB, collapse = " "), .keep="unused") %>% 
  slice(1) %>% 
  right_join(df2 %>% 
               mutate(helper = tolower(ColD)), by="helper") %>% 
  select(ColA, ColB, ColC)
   ColA ColB                            ColC     
  <int> <chr>                           <chr>    
1   123 C/O room Hanbur court vaux road Lightroom
2   456 House Malveri business park     Office