在 R 中的同一个句子中搜索一组单词

Searching for a group of words within the same sentence in R

我正在尝试在同一句话的上下文中搜索一组词。例如,我试图找出单词 "not""sugar" 是否都存在于 单句

string = c(
"I do not like sugar. However, I like coffee.", 
"I like sugar. But I do not like coffee.")

两个文本都包含单词 "not""sugar",但只有第一个文本在 同一个句子中同时包含单词 "not""sugar"。在第二个文本中,"not""sugar"存在于不同的句子中。

我想 return 第一个文本 TRUE 第二个文本 FALSE

我试过了grepl("not\ssugar", string)

这是一种可能的方法,当然不是最有效的,也不是更容易阅读的 (!) 虽然有好处,但它甚至可以为您提供正确的句子。 我已经隔离了要测试的单词集和代码,以便您能够测试任意数量单词的 co-occurence。

string = c(
  "I do not like sugar. However, I like coffee.", 
  "I like sugar. But I do not like coffee.")

checkwords=lapply(string,
FUN=function(str,words=c("sugar","not"))
{
  sapply(strsplit(str,"\.")[[1]],FUN=function(el){
    any(all(sapply(words,
           FUN=function(wd)grepl(wd,el))))
     })
})
# yes this can be a one line instruction...
checkwords

 [[1]]
     I do not like sugar  However, I like coffee 
               TRUE                   FALSE 

 [[2]]
              I like sugar  But I do not like coffee 
                     FALSE                     FALSE 

然后检查初始向量的每个元素是否至少存在一个 TRUE string:

sapply(checkwords, any)
[1]  TRUE FALSE

您的尝试非常接近....此 [^\.,!?:;] 允许 likesugar[= 之间的标点符号以外的任何字符16=].

string = c(
  "I do not like sugar. However, I like coffee.", 
  "I like sugar. But I do not like coffee.",
  "I do not like coffee. But I love sugar.")

grepl("not[^\.,!?:;]*sugar", string)