基于列表的部分字符串匹配

Partial String Match based on a list

我想在整个列表中进行部分字符串匹配。然后创建一个数据框,在缩写名称的名称旁边显示正确的名称。

我确定这很容易,但我还没有找到它。

例如:


library(data.table)


list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut")

list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger")

# I've tried

Pattern = paste(list_proper, collapse="|")

DT_result = data.table(list_abbreviated, result=grepl(Pattern, list_abbreviated ))
DT_result

# This is the result

   list_abbreviated result
1:       KF Chicken  FALSE
2:       CHI Wendys  FALSE
3:     CAL InandOut  FALSE

# I tried other options using %like% to no avail either. 

# This is the output I  am looking for

  list_abbreviated result            list_proper
1       KF Chicken   TRUE Kentucky Fried Chicken
2       CHI Wendys   TRUE         Chicago Wendys
3     CAL InandOut   TRUE    California InandOut

一个选项是创建姓氏的子集以进行部分连接。因此,我们可以使用 regex_inner_join from fuzzyjoin 进行部分连接,将两个数据表合并在一起。

library(stringi)
library(fuzzyjoin)

list_abbreviated = data.table(list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut"))
list_abbreviated[, limited:= stri_extract_last_words(list_abbreviated)]

list_proper = data.table(list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger"))

DT_result <- data.table(regex_inner_join(list_proper, list_abbreviated, by = c("list_proper" = "limited")))
DT_result[,limited:=NULL]

输出

              list_proper list_abbreviated
1: Kentucky Fried Chicken       KF Chicken
2:         Chicago Wendys       CHI Wendys
3:    California InandOut     CAL InandOut