逻辑字符串匹配

Question

将单词与我的句子匹配的最佳方式是什么？这是一个小例子：

words <- c("apple", "pear", "grape")
sentences <- c("I have an apple and a pear", "Grape is my favorite", "I don't like pear")

最好是输出如下所示：

count  sentence 
2      "I have an apple and a pear"
1      "Grape is my favorite"
1      "I don't like pear

我试过使用 str_count 但无济于事。感谢您的帮助！

Answer 1

library(stringr)
str_count(sentences, paste0("(?i)\b(", paste0(words, collapse = "|"), ")\b"))
[1] 2 1 1

这是如何工作的：

(?i)：这确保模式匹配不区分大小写
\b 和 \b 确保单词匹配为带有单词边界的单词（如果未使用 \b 你可能最终会匹配到包含你的话但自己形成一个不同的词，例如grapple，它包含apple)
( 和 ) 形成一个非捕获组，其内容是 words 分隔的，或者如果您愿意，可以合并，用竖线 | , 表示 'OR'.

如果你想把它放在数据框中：

df <- data.frame(
  sentences = sentences,
  count = str_count(sentences, paste0("(?i)\b(", paste0(words, collapse = "|"), ")\b")))

结果：

  df
                     sentences count
  1 I have an apple and a pear     2
  2       Grape is my favorite     1
  3          I don't like pear     1

逻辑字符串匹配

Logical string matching

regex

r

stringr