tm 包 removeWords 函数连接 R 中的单词
tm package removeWords function concatenate words in R
我正在使用 tm 包中的 removewords 清理示例数据,但 removeWords 函数连接单词 post 删除。应该是“环境死蛙”“环境死老鼠”。有人可以指导吗?
library(tm)
dc<-c("environmental dead frog still","environmental dead mouse come")
manualremovelist<-c("the","does","doesn't","please","new","ok","one","cant",
"doesnt","can","still","done","will","without","seen",
"also","danfoss","case","doesn´t","due","need","occurs","made",
"using","now","make","makes","needs","put","okay","sno","since","therefore",
"found","milwaukee","probably","got","finally","isnt","per","two",
"obvious","unable","must","nos","3nos","1no",".","phone","tel","attached",
"given","find","have","see","be","give","do","come","use","make","get",
"try","call","request")
dc<-removeWords(dc,manualremovelist)
"environmentaldeadfrog" "environmentaldeadmouse"
removeWords
仅适用于单词。您可以将字符串拆分为单词并在单个 phrases/sentences.
上使用 removeWords
library(tm)
dc <- sapply(strsplit(dc, '\s+'), function(x)
trimws(paste0(removeWords(x, manualremovelist), collapse = ' ')))
dc
#[1] "environmental dead frog" "environmental dead mouse"
我正在使用 tm 包中的 removewords 清理示例数据,但 removeWords 函数连接单词 post 删除。应该是“环境死蛙”“环境死老鼠”。有人可以指导吗?
library(tm)
dc<-c("environmental dead frog still","environmental dead mouse come")
manualremovelist<-c("the","does","doesn't","please","new","ok","one","cant",
"doesnt","can","still","done","will","without","seen",
"also","danfoss","case","doesn´t","due","need","occurs","made",
"using","now","make","makes","needs","put","okay","sno","since","therefore",
"found","milwaukee","probably","got","finally","isnt","per","two",
"obvious","unable","must","nos","3nos","1no",".","phone","tel","attached",
"given","find","have","see","be","give","do","come","use","make","get",
"try","call","request")
dc<-removeWords(dc,manualremovelist)
"environmentaldeadfrog" "environmentaldeadmouse"
removeWords
仅适用于单词。您可以将字符串拆分为单词并在单个 phrases/sentences.
removeWords
library(tm)
dc <- sapply(strsplit(dc, '\s+'), function(x)
trimws(paste0(removeWords(x, manualremovelist), collapse = ' ')))
dc
#[1] "environmental dead frog" "environmental dead mouse"