无法让 tm_map 使用 mc.cores 参数
unable to get tm_map to use mc.cores argument
我有一个包含超过 10M 文档的大型语料库。每当我尝试使用 mc.cores 参数对多个内核进行转换时,我都会收到错误消息:
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)
我当前托管的 r studio 中有 15 个可用内核。
# I have a corpus
> inspect(corpus[1])
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 1
[[1]]
<<PlainTextDocument>>
Metadata: 7
Content: chars: 46
> length(corpus)
[1] 10255313
观察当我尝试使用 tm_map
进行转换时会发生什么
library(tidyverse)
library(qdap)
library(stringr)
library(tm)
library(textstem)
library(stringi)
library(SnowballC)
例如
> corpus <- tm_map(corpus, content_transformer(replace_abbreviation), mc.cores = 10)
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)
尝试添加 lazy = T
corpus <- tm_map(corpus, content_transformer(replace_abbreviation), mc.cores = 10, lazy = T) # read the documentation, still don't really get what this does
改造后如果我去例如
> corpus[[1]][1] I get:
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)
而之前我会得到:
> corpus.beforetransformation[[1]][1]
$content
[1] "here is some text"
我在这里做错了什么?我如何使用 mc.cores 参数来使用我的更多处理器?
可重现的例子:
sometext <- c("cats dogs rabbits", "oranges banannas pears", "summer fall winter") %>%
data.frame(stringsAsFactors = F) %>% DataframeSource %>% VCorpus
corpus.example <- tm_map(sometext, content_transformer(replace_abbreviation), mc.cores = 2, lazy = T)
corpus.example[[1]][1]
来自 tm documentation,尝试以下操作:
options(mc.cores = 10) # or whatever
tm_parLapply_engine(parallel::mclapply) # mclapply gets the number of cores from global options
tm_map(sometext, content_transformer(replace_abbreviation))
我有一个包含超过 10M 文档的大型语料库。每当我尝试使用 mc.cores 参数对多个内核进行转换时,我都会收到错误消息:
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)
我当前托管的 r studio 中有 15 个可用内核。
# I have a corpus
> inspect(corpus[1])
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 1
[[1]]
<<PlainTextDocument>>
Metadata: 7
Content: chars: 46
> length(corpus)
[1] 10255313
观察当我尝试使用 tm_map
进行转换时会发生什么library(tidyverse)
library(qdap)
library(stringr)
library(tm)
library(textstem)
library(stringi)
library(SnowballC)
例如
> corpus <- tm_map(corpus, content_transformer(replace_abbreviation), mc.cores = 10)
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)
尝试添加 lazy = T
corpus <- tm_map(corpus, content_transformer(replace_abbreviation), mc.cores = 10, lazy = T) # read the documentation, still don't really get what this does
改造后如果我去例如
> corpus[[1]][1] I get:
Error in FUN(content(x), ...) : unused argument (mc.cores = 10)
而之前我会得到:
> corpus.beforetransformation[[1]][1]
$content
[1] "here is some text"
我在这里做错了什么?我如何使用 mc.cores 参数来使用我的更多处理器?
可重现的例子:
sometext <- c("cats dogs rabbits", "oranges banannas pears", "summer fall winter") %>%
data.frame(stringsAsFactors = F) %>% DataframeSource %>% VCorpus
corpus.example <- tm_map(sometext, content_transformer(replace_abbreviation), mc.cores = 2, lazy = T)
corpus.example[[1]][1]
来自 tm documentation,尝试以下操作:
options(mc.cores = 10) # or whatever
tm_parLapply_engine(parallel::mclapply) # mclapply gets the number of cores from global options
tm_map(sometext, content_transformer(replace_abbreviation))