使用应用函数将句子向量转换为单词向量

converting a vector of sentences to vector of words using apply functions

在 R 中,我有这个句子向量,我想将它转换为单词向量。我如何使用 apply 函数来实现?

test.sentences <- c("boy who boys see lives .",
                    "cats who Mary feeds hear .",
                    "girls who see see John .",
                    "John hears dogs .",
                    "John lives .",
                    "Mary hears cat .",
                    "boys who Mary chases see girl .",
                    "dog who John sees feeds Mary .",
                    "girls feed cats who see .",
                    "Mary chases girls who Mary chases .",
                    "Mary hears .",
                    "boy who hears cats walks .",
                    "girl who dog sees feeds boy .",
                    "Mary lives .",
                    "Mary sees boy .",
                    "cat who walks lives .",
                    "Mary sees girl who chases John .",
                    "John chases boys who boy hears .",
                    "cats hear boy who feeds boys .",
                    "girls who hear see cats who hear .",
                    "girls who cats feed chase John .",
                    "cat lives .",
                    "cats live ." )

您不需要使用任何 *apply() 函数来执行此操作。这是使用 stringi 包的一种非常简单有效的方法。

stringi::stri_extract_all_words(test.sentences)

此 returns 列表,test.sentences 中的每个元素对应一个元素,句点 (.) 已被删除。对于原子向量,只需将其包装在 unlist() 中。对于矩阵,使用 simplify = TRUE.

在基数 R 中:

res <- unlist(strsplit(test.sentences," "))
res[res != "."]

 unlist(strsplit(gsub("\.","",test.sentences)," "))

这是一个 qdap 方法(我维护):

library(qdap)
lapply(test.sentences, bag_o_words)

或作为单个向量:

bag_o_words(test.sentences)

你有没有试过do.call,你可以试试这个,不确定它是否适用于你的情况:

test.sentences <- c("boy who boys see lives .",
                    "cats who Mary feeds hear .",
                    "girls who see see John .",
                    "John hears dogs .",
                    "John lives .",
                    "Mary hears cat .",
                    "boys who Mary chases see girl .",
                    "dog who John sees feeds Mary .",
                    "girls feed cats who see .",
                    "Mary chases girls who Mary chases .",
                    "Mary hears .",
                    "boy who hears cats walks .",
                    "girl who dog sees feeds boy .",
                    "Mary lives .",
                    "Mary sees boy .",
                    "cat who walks lives .",
                    "Mary sees girl who chases John .",
                    "John chases boys who boy hears .",
                    "cats hear boy who feeds boys .",
                    "girls who hear see cats who hear .",
                    "girls who cats feed chase John .",
                    "cat lives .",
                    "cats live ." )
vector_of_words <- do.call(rbind, strsplit(as.character(test.sentences), " "))
test <- cbind(test.sentences, vector_of_words)