使用常用词按行合并两个数据框
Merge two dataframe by rows using common words
df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))
我会选择使用自由文本列来合并两个数据框。但是文字与删除或显示的某些单词并不完全相同。
是否有任何选项可以找到行之间相同单词的最大数量并根据此合并它们?
这是预期输出的示例
df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))
或许,您可以从 fuzzyjoin
查看 stringdist
连接并使用适合您数据的 max_dist
参数。
fuzzyjoin::stringdist_inner_join(df1, df2, by = 'freetext', max_dist = 10)
# freetext.x numid freetext.y aid
# <chr> <dbl> <chr> <dbl>
#1 open until monday night 291 open until night 3
#2 one more time to insert your coin 312 one time to insert your be 5
df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))
我会选择使用自由文本列来合并两个数据框。但是文字与删除或显示的某些单词并不完全相同。
是否有任何选项可以找到行之间相同单词的最大数量并根据此合并它们?
这是预期输出的示例
df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))
或许,您可以从 fuzzyjoin
查看 stringdist
连接并使用适合您数据的 max_dist
参数。
fuzzyjoin::stringdist_inner_join(df1, df2, by = 'freetext', max_dist = 10)
# freetext.x numid freetext.y aid
# <chr> <dbl> <chr> <dbl>
#1 open until monday night 291 open until night 3
#2 one more time to insert your coin 312 one time to insert your be 5