基于列表的部分字符串匹配
Partial String Match based on a list
我想在整个列表中进行部分字符串匹配。然后创建一个数据框,在缩写名称的名称旁边显示正确的名称。
我确定这很容易,但我还没有找到它。
例如:
library(data.table)
list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut")
list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger")
# I've tried
Pattern = paste(list_proper, collapse="|")
DT_result = data.table(list_abbreviated, result=grepl(Pattern, list_abbreviated ))
DT_result
# This is the result
list_abbreviated result
1: KF Chicken FALSE
2: CHI Wendys FALSE
3: CAL InandOut FALSE
# I tried other options using %like% to no avail either.
# This is the output I am looking for
list_abbreviated result list_proper
1 KF Chicken TRUE Kentucky Fried Chicken
2 CHI Wendys TRUE Chicago Wendys
3 CAL InandOut TRUE California InandOut
一个选项是创建姓氏的子集以进行部分连接。因此,我们可以使用 regex_inner_join
from fuzzyjoin
进行部分连接,将两个数据表合并在一起。
library(stringi)
library(fuzzyjoin)
list_abbreviated = data.table(list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut"))
list_abbreviated[, limited:= stri_extract_last_words(list_abbreviated)]
list_proper = data.table(list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger"))
DT_result <- data.table(regex_inner_join(list_proper, list_abbreviated, by = c("list_proper" = "limited")))
DT_result[,limited:=NULL]
输出
list_proper list_abbreviated
1: Kentucky Fried Chicken KF Chicken
2: Chicago Wendys CHI Wendys
3: California InandOut CAL InandOut
我想在整个列表中进行部分字符串匹配。然后创建一个数据框,在缩写名称的名称旁边显示正确的名称。
我确定这很容易,但我还没有找到它。
例如:
library(data.table)
list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut")
list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger")
# I've tried
Pattern = paste(list_proper, collapse="|")
DT_result = data.table(list_abbreviated, result=grepl(Pattern, list_abbreviated ))
DT_result
# This is the result
list_abbreviated result
1: KF Chicken FALSE
2: CHI Wendys FALSE
3: CAL InandOut FALSE
# I tried other options using %like% to no avail either.
# This is the output I am looking for
list_abbreviated result list_proper
1 KF Chicken TRUE Kentucky Fried Chicken
2 CHI Wendys TRUE Chicago Wendys
3 CAL InandOut TRUE California InandOut
一个选项是创建姓氏的子集以进行部分连接。因此,我们可以使用 regex_inner_join
from fuzzyjoin
进行部分连接,将两个数据表合并在一起。
library(stringi)
library(fuzzyjoin)
list_abbreviated = data.table(list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut"))
list_abbreviated[, limited:= stri_extract_last_words(list_abbreviated)]
list_proper = data.table(list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger"))
DT_result <- data.table(regex_inner_join(list_proper, list_abbreviated, by = c("list_proper" = "limited")))
DT_result[,limited:=NULL]
输出
list_proper list_abbreviated
1: Kentucky Fried Chicken KF Chicken
2: Chicago Wendys CHI Wendys
3: California InandOut CAL InandOut