逻辑字符串匹配
Logical string matching
将单词与我的句子匹配的最佳方式是什么?这是一个小例子:
words <- c("apple", "pear", "grape")
sentences <- c("I have an apple and a pear", "Grape is my favorite", "I don't like pear")
最好是输出如下所示:
count sentence
2 "I have an apple and a pear"
1 "Grape is my favorite"
1 "I don't like pear
我试过使用 str_count
但无济于事。感谢您的帮助!
library(stringr)
str_count(sentences, paste0("(?i)\b(", paste0(words, collapse = "|"), ")\b"))
[1] 2 1 1
这是如何工作的:
(?i)
:这确保模式匹配不区分大小写
\b
和 \b
确保单词匹配为带有单词边界的单词(如果未使用 \b
你可能最终会匹配到 包含你的话但自己形成一个不同的词,例如grapple
,它包含apple
)
(
和 )
形成一个非捕获组,其内容是 words
分隔的,或者如果您愿意,可以合并,用竖线 |
, 表示 'OR'. 的交替元字符
如果你想把它放在数据框中:
df <- data.frame(
sentences = sentences,
count = str_count(sentences, paste0("(?i)\b(", paste0(words, collapse = "|"), ")\b")))
结果:
df
sentences count
1 I have an apple and a pear 2
2 Grape is my favorite 1
3 I don't like pear 1
将单词与我的句子匹配的最佳方式是什么?这是一个小例子:
words <- c("apple", "pear", "grape")
sentences <- c("I have an apple and a pear", "Grape is my favorite", "I don't like pear")
最好是输出如下所示:
count sentence
2 "I have an apple and a pear"
1 "Grape is my favorite"
1 "I don't like pear
我试过使用 str_count
但无济于事。感谢您的帮助!
library(stringr)
str_count(sentences, paste0("(?i)\b(", paste0(words, collapse = "|"), ")\b"))
[1] 2 1 1
这是如何工作的:
(?i)
:这确保模式匹配不区分大小写\b
和\b
确保单词匹配为带有单词边界的单词(如果未使用\b
你可能最终会匹配到 包含你的话但自己形成一个不同的词,例如grapple
,它包含apple
)(
和)
形成一个非捕获组,其内容是words
分隔的,或者如果您愿意,可以合并,用竖线|
, 表示 'OR'. 的交替元字符
如果你想把它放在数据框中:
df <- data.frame(
sentences = sentences,
count = str_count(sentences, paste0("(?i)\b(", paste0(words, collapse = "|"), ")\b")))
结果:
df
sentences count
1 I have an apple and a pear 2
2 Grape is my favorite 1
3 I don't like pear 1