用于查找至少 2 个在 2 个字符串之间匹配的单词的 R 函数(应用于 2 个字符串向量)?
R function for finding at least 2 words that match between 2 strings (applied over 2 vectors of strings)?
我有两套琴弦。本例为 Char 和 Char2。我正在尝试查找 Char 是否至少包含 Char2 中的 2 个单词(任何两个单词都可以匹配)。 "at least 2 words" 部分我还没有讲到,但我必须首先弄清楚每个字符串中任何单词的匹配。任何帮助将不胜感激。
我尝试了几种不同的方式来使用 stringr 包。请看下面。我尝试使用与罗伯特在这个问题中回答的类似的解决方案:Detect multiple strings with dplyr and stringr
shopping_list <- as.data.frame(c("good apples", "bag of apples", "bag of sugar", "milk x2"))
colnames(shopping_list) <- "Char"
shopping_list2 <- as.data.frame(c("good pears", "bag of sugar", "bag of flour", "sour milk x2"))
colnames(shopping_list2) <- "Char2"
shop = cbind(shopping_list , shopping_list2)
shop$Char = as.character(shop$Char)
shop$Char2 = as.character(shop$Char2)
# First attempt
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
# Second attempt
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
我得到这些结果:
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
good apples bag of apples bag of sugar milk x2
FALSE FALSE TRUE FALSE
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
FALSE FALSE TRUE FALSE
但是我正在寻找这些结果:
假真真真
1) FALSE 因为只有 1 个词匹配
2) TRUE 因为 "bag of" 在两个
3) TRUE 因为 "bag of" 在两个
4) 正确,因为 "milk x2" 在两个
中
这是一个可以提供帮助的函数
match_test <- function (string1, string2) {
words1 <- unlist(strsplit(string1, ' '))
words2 <- unlist(strsplit(string2, ' '))
common_words <- intersect(words1, words2)
length(common_words) > 1
}
这是一个例子
string1 <- c("good apples" , "bag of apples", "bag of sugar", "milk x2")
string2 <- c("good pears" , "bag of sugar", "bag of flour", "sour milk x2")
vapply(seq_along(string1), function (k) match_test(string1[k], string2[k]), logical(1))
# [1] FALSE TRUE TRUE TRUE
我有两套琴弦。本例为 Char 和 Char2。我正在尝试查找 Char 是否至少包含 Char2 中的 2 个单词(任何两个单词都可以匹配)。 "at least 2 words" 部分我还没有讲到,但我必须首先弄清楚每个字符串中任何单词的匹配。任何帮助将不胜感激。
我尝试了几种不同的方式来使用 stringr 包。请看下面。我尝试使用与罗伯特在这个问题中回答的类似的解决方案:Detect multiple strings with dplyr and stringr
shopping_list <- as.data.frame(c("good apples", "bag of apples", "bag of sugar", "milk x2"))
colnames(shopping_list) <- "Char"
shopping_list2 <- as.data.frame(c("good pears", "bag of sugar", "bag of flour", "sour milk x2"))
colnames(shopping_list2) <- "Char2"
shop = cbind(shopping_list , shopping_list2)
shop$Char = as.character(shop$Char)
shop$Char2 = as.character(shop$Char2)
# First attempt
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
# Second attempt
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
我得到这些结果:
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
good apples bag of apples bag of sugar milk x2
FALSE FALSE TRUE FALSE
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
FALSE FALSE TRUE FALSE
但是我正在寻找这些结果:
假真真真
1) FALSE 因为只有 1 个词匹配 2) TRUE 因为 "bag of" 在两个 3) TRUE 因为 "bag of" 在两个 4) 正确,因为 "milk x2" 在两个
中这是一个可以提供帮助的函数
match_test <- function (string1, string2) {
words1 <- unlist(strsplit(string1, ' '))
words2 <- unlist(strsplit(string2, ' '))
common_words <- intersect(words1, words2)
length(common_words) > 1
}
这是一个例子
string1 <- c("good apples" , "bag of apples", "bag of sugar", "milk x2")
string2 <- c("good pears" , "bag of sugar", "bag of flour", "sour milk x2")
vapply(seq_along(string1), function (k) match_test(string1[k], string2[k]), logical(1))
# [1] FALSE TRUE TRUE TRUE