R: caret 在使用并行后端时不使用 PSOCKcluster 的主节点

R: caret does not use the master node of the PSOCKcluster when using parallel backend

我正在尝试让 caret 使用并行后端在超参数网格上训练 xgboost 模型。

这里有一些代码使用 Give Me Some Credit 数据来演示如何为 caret 的超参数网格搜索设置并行后端。

library(plyr)
library(dplyr)
library(pROC)
library(caret)
library(xgboost)
library(readr)
library(parallel)
library(doParallel)

if(exists("xgboost_cluster")) stopCluster(xgboost_cluster)
hosts = paste0("192.168.18.", 52:53)
xgboost_cluster = makePSOCKcluster(hosts, master="192.168.18.51")

# load the packages across the cluster
clusterEvalQ(xgboost_cluster, {
  deps = c("plyr", "Rcpp", "dplyr", "caret", "xgboost", "pROC", "foreach", "doParallel")
  for(d in deps) library(d, character.only = TRUE)
  rm(d, deps)
})

registerDoParallel(xgboost_cluster)  
# load in the training data
df_train = read_csv("04-GiveMeSomeCredit/Data/cs-training.csv") %>%
  na.omit() %>%                                                                # listwise deletion 
  select(-`[EMPTY]`) %>%
  mutate(SeriousDlqin2yrs = factor(SeriousDlqin2yrs,                           # factor variable for classification
                                   labels = c("Failure", "Success")))
# set up the cross-validated hyper-parameter search
xgb_grid_1 = expand.grid(
  nrounds = 1000,
  eta = c(0.01, 0.001, 0.0001),
  max_depth = c(2, 4, 6, 8, 10),
  gamma = 1
)

# pack the training control parameters
xgb_trcontrol_1 = trainControl(
  method = "cv",
  number = 5,
  verboseIter = TRUE,
  returnData = FALSE,
  returnResamp = "all",                                                        # save losses across all models
  classProbs = TRUE,                                                           # set to TRUE for AUC to be computed
  summaryFunction = twoClassSummary,
  allowParallel = TRUE
)

# train the model for each parameter combination in the grid, 
#   using CV to evaluate
xgb_train_1 = train(
  x = as.matrix(df_train %>%
                  select(-SeriousDlqin2yrs)),
  y = as.factor(df_train$SeriousDlqin2yrs),
  trControl = xgb_trcontrol_1,
  tuneGrid = xgb_grid_1,
  method = "xgbTree"
)

我检查过 hosts 上的所有内核都被用于训练,但在 master 节点上,没有使用任何进程。这是预期的行为吗?有什么方法可以改变这种行为并利用主节点上的核心进行处理吗?

为了利用master节点进行处理,只需要在hosts上加上'localhost',像这样:

hosts = c("localhost", paste0("192.168.18.", 52:53))

这会将您主节点上的一个核心添加到集群中,然后用于处理。如果要添加多个核心,只需传入更多实例 'localhost'.

hosts = c(rep('localhost', detectCores()), paste0("192.168.18.", 52:53))