使用 gsub 和 mapply 从另一个不同长度的词向量中删除一个词向量

Question

我有一个词向量，我想从另一个词向量中删除。我正在使用 mapply 和 gsub，但收到错误“较长的参数不是较短的长度的倍数”。

    sw_column <- c(stop_words$word)
head(sw_column)
[1] "a"         "a's"       "able"      "about"     "above"     "according"


x <- c(amplification.words, deamplification.words, negation.words)
head(x)
[1] "acute"      "acutely"    "certain"    "certainly"  "colossal"   "colossally"


stop_words_clean <- mapply(gsub, x, "", sw_column)
error message: longer argument not a multiple of length of shorter

我想从 sw_column 中删除 x 中的所有单词。注意：并非所有x中的词都出现在sw_column

中

Answer 1

只是猜测，但是“x”（第一个参数）中的 setdiff(x, y) returns 个元素不在“y”（第二个参数）中。所以，

stop_words_clean <- setdiff(sw_column, x)

可能就是你想要的。

示例：

sw_column <- c("a", "a's","able","about", "above","according")
x <- c("a", "able", "above")

setdiff(sw_column, x)
#[1] "a's"       "about"     "according"

至于 gsub，该函数修改字符向量的元素，这不是您声明的 objective。

Answer 2

如果你想将一个文本向量过滤成另一个你可以使用下面的代码，我使用了一些虚构的向量来解释我自己。

stop_words_example <- c("a", "a's", "able", "about", "above", "according")
x <- c("a", "a's", "able", "about", "above", "according", "acute", "acutely", "certain", "certainly", "colossal", "colossally")

x[!x %in% stop_words_example]

[1] "acute"      "acutely"    "certain"    "certainly"  "colossal"   "colossally"

使用 gsub 和 mapply 从另一个不同长度的词向量中删除一个词向量

Using gsub and mapply to remove a vector of words from another vector of words of different lengths

r

gsub

mapply