注册的 doParallel 集群不适用于 train/caret parRF 模型
Registered doParallel cluster doesn't work with the train/caret parRF model
我无法让 parRF
正常工作,即使 parApply
等其他东西工作得很好。
我试过 makeCluster
以及 makePSOCKcluster
和一些类似的变体。
一直返回错误task 1 failed - could not find function getDoParWorkers
cores_2_use <- detectCores() - 2
cl <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)
rf_train <- train(y=y, x=x,
method='parRF', tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)
Error in { : task 1 failed - "could not find function "getDoParWorkers""
我可以重现您的错误消息。解决它需要一些黑客攻击。我不确定这是错误还是其他原因。
但我设法通过复制模型和调整拟合函数让它工作。我在 fit 函数中添加了 require(foreach)
。
奇怪的是,一旦列车模型具有 运行 和新的 parRF_Mod 作为方法,出现错误的原始列车运行时没有任何错误。从干净的会话开始,错误再次出现。所以在某个地方有些事情没有按预期进行。
library(doParallel)
cl = makeCluster(parallel::detectCores()-1, type = "SOCK")
registerDoParallel(cl)
getDoParWorkers()
library(caret)
library(randomForest)
y <- mtcars$mpg
x <- mtcars[, -mtcars$mpg ]
parRF_mod <- getModelInfo("parRF", regex = FALSE)[[1]]
parRF_mod$fit <- function (x, y, wts, param, lev, last, classProbs, ...)
{
# added the requirement of foreach
require(foreach)
workers <- getDoParWorkers()
theDots <- list(...)
theDots$ntree <- if (is.null(theDots$ntree))
250
else theDots$ntree
theDots$x <- x
theDots$y <- y
theDots$mtry <- param$mtry
theDots$ntree <- ceiling(theDots$ntree/workers)
out <- foreach(ntree = 1:workers, .combine = combine) %dopar%
{
library(randomForest)
do.call("randomForest", theDots)
}
out$call["x"] <- "x"
out$call["y"] <- "y"
out
}
rf_train <- train(y=y, x=x,
method=parRF_mod, tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)
stopcluster(cl)
我的会话信息:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-12 e1071_1.6-7 caret_6.0-58 ggplot2_1.0.1 lattice_0.20-33 doParallel_1.0.10 iterators_1.0.8
[8] foreach_1.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 magrittr_1.5 splines_3.2.2 MASS_7.3-44 munsell_0.4.2 colorspace_1.2-6 minqa_1.2.4 stringr_1.0.0
[9] car_2.1-0 plyr_1.8.3 tools_3.2.2 nnet_7.3-11 pbkrtest_0.4-2 grid_3.2.2 gtable_0.1.2 nlme_3.1-122
[17] mgcv_1.8-8 quantreg_5.19 snow_0.3-13 class_7.3-14 MatrixModels_0.4-1 lme4_1.1-10 digest_0.6.8 Matrix_1.2-2
[25] nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-14 stringi_1.0-1 compiler_3.2.2 scales_0.3.0 stats4_3.2.2 SparseM_1.7
[33] proto_0.3-10
更新: Topepo 已更新 Github 上的代码以修复此错误!刚刚 install_github("/topepo/caret/pkg/caret/")
我之前的以下回答已弃用
有人 from Github 也提出了这个解决方法:
# parallel
require(caret); library(doParallel);
cl <- makePSOCKcluster(detectCores());
clusterEvalQ(cl, library(foreach)); registerDoParallel(cl)
y <- mtcars$mpg; x <- mtcars[, -mtcars$mpg];
#--------------------------------------------------------------
rf_train <- train(y=y, x=x,
method='parRF', tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)
rf_train
#--------------------------------------------------------------
stopCluster(cl);
一定要在 运行 这个版本的代码之前重新开始。即使在 stopCluster(cl)
和 stopImplicitCluster()
再次尝试 parRF 之后,在我完全重新启动 R 和 RStudio 之前,这种方法对我不起作用。
我无法让 parRF
正常工作,即使 parApply
等其他东西工作得很好。
我试过 makeCluster
以及 makePSOCKcluster
和一些类似的变体。
一直返回错误task 1 failed - could not find function getDoParWorkers
cores_2_use <- detectCores() - 2
cl <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)
rf_train <- train(y=y, x=x,
method='parRF', tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)
Error in { : task 1 failed - "could not find function "getDoParWorkers""
我可以重现您的错误消息。解决它需要一些黑客攻击。我不确定这是错误还是其他原因。
但我设法通过复制模型和调整拟合函数让它工作。我在 fit 函数中添加了 require(foreach)
。
奇怪的是,一旦列车模型具有 运行 和新的 parRF_Mod 作为方法,出现错误的原始列车运行时没有任何错误。从干净的会话开始,错误再次出现。所以在某个地方有些事情没有按预期进行。
library(doParallel)
cl = makeCluster(parallel::detectCores()-1, type = "SOCK")
registerDoParallel(cl)
getDoParWorkers()
library(caret)
library(randomForest)
y <- mtcars$mpg
x <- mtcars[, -mtcars$mpg ]
parRF_mod <- getModelInfo("parRF", regex = FALSE)[[1]]
parRF_mod$fit <- function (x, y, wts, param, lev, last, classProbs, ...)
{
# added the requirement of foreach
require(foreach)
workers <- getDoParWorkers()
theDots <- list(...)
theDots$ntree <- if (is.null(theDots$ntree))
250
else theDots$ntree
theDots$x <- x
theDots$y <- y
theDots$mtry <- param$mtry
theDots$ntree <- ceiling(theDots$ntree/workers)
out <- foreach(ntree = 1:workers, .combine = combine) %dopar%
{
library(randomForest)
do.call("randomForest", theDots)
}
out$call["x"] <- "x"
out$call["y"] <- "y"
out
}
rf_train <- train(y=y, x=x,
method=parRF_mod, tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)
stopcluster(cl)
我的会话信息:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-12 e1071_1.6-7 caret_6.0-58 ggplot2_1.0.1 lattice_0.20-33 doParallel_1.0.10 iterators_1.0.8
[8] foreach_1.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 magrittr_1.5 splines_3.2.2 MASS_7.3-44 munsell_0.4.2 colorspace_1.2-6 minqa_1.2.4 stringr_1.0.0
[9] car_2.1-0 plyr_1.8.3 tools_3.2.2 nnet_7.3-11 pbkrtest_0.4-2 grid_3.2.2 gtable_0.1.2 nlme_3.1-122
[17] mgcv_1.8-8 quantreg_5.19 snow_0.3-13 class_7.3-14 MatrixModels_0.4-1 lme4_1.1-10 digest_0.6.8 Matrix_1.2-2
[25] nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-14 stringi_1.0-1 compiler_3.2.2 scales_0.3.0 stats4_3.2.2 SparseM_1.7
[33] proto_0.3-10
更新: Topepo 已更新 Github 上的代码以修复此错误!刚刚 install_github("/topepo/caret/pkg/caret/")
我之前的以下回答已弃用
有人 from Github 也提出了这个解决方法:
# parallel
require(caret); library(doParallel);
cl <- makePSOCKcluster(detectCores());
clusterEvalQ(cl, library(foreach)); registerDoParallel(cl)
y <- mtcars$mpg; x <- mtcars[, -mtcars$mpg];
#--------------------------------------------------------------
rf_train <- train(y=y, x=x,
method='parRF', tuneGrid = data.frame(mtry = ncol(x)), na.action = na.omit,
trControl=trainControl(method='oob',number=10, allowParallel = TRUE)
)
rf_train
#--------------------------------------------------------------
stopCluster(cl);
一定要在 运行 这个版本的代码之前重新开始。即使在 stopCluster(cl)
和 stopImplicitCluster()
再次尝试 parRF 之后,在我完全重新启动 R 和 RStudio 之前,这种方法对我不起作用。