检查哪些词在另一个向量的词中至少出现一次

Question

假设我们有一个单词列表：

words = c("happy","like","chill")

现在我有另一个字符串变量：

s = "happyMeal"

我想检查words中的哪个词与s中的匹配部分。所以 s 可以是 "happyTime"、"happyFace"、"happyHour"，但只要里面有 "happy"，我就希望我的结果是 return 单词 [ 的索引=24=]中的字符串向量词。

此问题与 post 中提出的问题相似但不相同：Find a string in another string in R。

Answer 1

您可以使用 sapply 遍历要搜索的每个词，使用 grepl 来确定该词是否出现在 s:

中

sapply(words, grepl, s)
# happy  like chill 
#  TRUE FALSE FALSE

如果 s 是一个单词，那么 sapply 和 grepl returns 是一个逻辑向量，您可以使用它来确定匹配的单词：

words[sapply(words, grepl, s)]
# [1] "happy"

当s包含多个词时，则sapply与greplreturns一个逻辑矩阵，可以通过列和来判断哪些词出现在至少一次：

s <- c("happyTime", "chilling", "happyFace")
words[colSums(sapply(words, grepl, s)) > 0]
# [1] "happy" "chill"

Answer 2

这是一个使用 stringi

中的 stri_detect 的选项

library(stringi)
words[stri_detect_regex(s, words)]
#[1] "happy"

Check which words show up at least once within words from another vector