使用 R 中给定的关键字集计算单词数

Question

如何使用给定的固定关键字计算每次观察中的单词数？为了澄清，这里有一个例子。

这是“文本”和一组“关键字”

Text=c("I have bought a shirt from the store", "This shirt looks very good")
Keywords=c("have", "from", "good")

我想获得以下输出。

output=c(2,1)

在“文本”的第一句话中（即“我从商店买了一件衬衫”），我观察了两次“关键字”。 “有”和“来自”。同样，在“文本”的第二句中，我观察到“关键词”曾经是“好”。

Answer 1

您可以将单词边界 (\b) 添加到 Keywords 并将它们折叠成一个字符串以在 str_count.

中使用

library(stringr)
str_count(Text, str_c('\b',Keywords, '\b', collapse = '|'))
#[1] 2 1

在基础 R 中，您可以使用 regmatches + gregexpr。

lengths(regmatches(Text, gregexpr(paste0('\b',Keywords, '\b', collapse = '|'), Text)))

Answer 2

您可以使用此调用： unlist(lapply(lapply(Text,stringr::str_detect,Keywords),sum))

lapply 允许您将函数应用于向量的每个元素，因此此调用：

Count the number of words using the given set of keywords in R