如何在同一个正则表达式中搜索多个单词?

How can I search multiple words in the same regex?

我有一个要删除句子列表的特定单词列表。我是否必须遍历列表并对每个正则表达式应用一个函数,或者我可以以某种方式一次调用它们吗?我已经尝试用 lapply 这样做,但我希望找到更好的方法。

 string <- 'This is a sample sentence from which to gather some cool 
 knowledge'

 words <- c('a','from','some')

lapply(words,function(x){
  string <- gsub(paste0('\b',words,'\b'),'',string)
})

我想要的输出是: This is sample sentence which to gather cool knowledge.

您可以使用正则表达式 OR 运算符("|")折叠 words-to-remove 的字符向量,有时称为 "pipe" 符号。

gsub(paste0('\b',words,'\b', collapse="|"), '', string)
[1] "This is  sample sentence  which to gather  cool \n knowledge"

或者:

gsub(paste0('\b',words,'\b\s{0,1}', collapse="|"), '', string)
[1] "This is sample sentence which to gather cool \n knowledge"

您需要使用 "|" 或在正则表达式中使用:

string2 <- gsub(paste(words,'|',collapse =""),'',string)

> string2
[1] "This is sample sentence which to gather cool knowledge"
string<-'This is a sample sentence from which to gather some cool knowledge'
words<-c('a', 'from', 'some')
library(tm)
string<-removeWords(string, words = words)
string
[1] "This is  sample sentence  which to gather  cool knowledge"

通过 tm 库,您可以使用 removeWords()

或者您可以像这样使用 gsub 循环:

string<-'This is a sample sentence from which to gather some cool knowledge'
words<-c('a', 'from', 'some')
for(i in 1:length(words)) {
  string<-gsub(pattern = words[i], replacement = '', x = string)
}
string
[1] "This is  sample sentence  which to gather  cool knowledge"

希望对您有所帮助。