使用 gsub 和 mapply 从另一个不同长度的词向量中删除一个词向量

Using gsub and mapply to remove a vector of words from another vector of words of different lengths

我有一个词向量,我想从另一个词向量中删除。我正在使用 mapply 和 gsub,但收到错误“较长的参数不是较短的长度的倍数”。

    sw_column <- c(stop_words$word)
head(sw_column)
[1] "a"         "a's"       "able"      "about"     "above"     "according"


x <- c(amplification.words, deamplification.words, negation.words)
head(x)
[1] "acute"      "acutely"    "certain"    "certainly"  "colossal"   "colossally"


stop_words_clean <- mapply(gsub, x, "", sw_column)
error message: longer argument not a multiple of length of shorter

我想从 sw_column 中删除 x 中的所有单词。注意:并非所有x中的词都出现在sw_column

只是猜测,但是“x”(第一个参数)中的 setdiff(x, y) returns 个元素不在“y”(第二个参数)中。所以,

stop_words_clean <- setdiff(sw_column, x)

可能就是你想要的。

示例:

sw_column <- c("a", "a's","able","about", "above","according")
x <- c("a", "able", "above")

setdiff(sw_column, x)
#[1] "a's"       "about"     "according"

至于 gsub,该函数 修改 字符向量的元素,这不是您声明的 objective。

如果你想将一个文本向量过滤成另一个你可以使用下面的代码,我使用了一些虚构的向量来解释我自己。

stop_words_example <- c("a", "a's", "able", "about", "above", "according")
x <- c("a", "a's", "able", "about", "above", "according", "acute", "acutely", "certain", "certainly", "colossal", "colossally")

x[!x %in% stop_words_example]

[1] "acute"      "acutely"    "certain"    "certainly"  "colossal"   "colossally"