从 R 中的 randomForest 包并行化 rfcv

Question

我正在尝试使用 rfcv 函数进行多元随机森林特征选择。我设法获得正常的 rf 命令（构建随机森林）模型以使用以下方法进行并行处理：

library(randomForest)
library(doMC)
nCores <- detectCores();
registerDoMC(nCores) #number of cores on the machine
rf.model <- foreach(ntree=rep(round(510/nCores),nCores), .combine=combine, .multicombine=TRUE, .packages="randomForest") %dopar% {
    rf <- randomForest(y = outcome, x = predictor, ntree=ntree, mtry=4,      norm.votes=FALSE, importance=TRUE)
  }

在使用这个之前，我想使用 rfcv 进行特征选择。我尝试使用以下方法按照上面的方法进行操作：

  rf.model <- foreach(1:nCores, .packages="randomForest") %dopar% {
    rf.rfcv <- rfcv(ytrain = outcome, xtrain = predictor, scale=4)
  }

但是，这个函数的结果多次重复，所以我只得到 rf.rfcv 作为 4 个相同结果的列表。

任何帮助将不胜感激！谢谢！

Answer 1

randomForest 可以运行无缝并行，因为 randomForest::combine 函数会将 4 rf.objects 减少为一个对象。因此，在第一个代码示例中，您仅使用不同的随机种子训练 4 个森林模型。使用 combine=combine（隐式 combine=randomForest::combine），您指定 4 个模型的输出列表应使用 randomForest 包中的专用组合函数进行缩减。

rfcv 没有任何合并功能，简单地合并四个输出也没有意义。在您的代码中，foreach 只需运行s 函数 4 次和 returns 列表中的输出。如果你想并行运行 rfcv，修复方法如下：

my.rfcv = randomForest::rfcv #copy function from package to .Global.env
fix(my.rfcv) #inspect function and perhaps copy entire function to your source functions script

#rewrite for-loop at line 35-57 into a foreach-loop
#write a reducer to combine test results of each fold

从 R 中的 randomForest 包并行化 rfcv

Parallelise rfcv from the randomForest package in R

r

feature-selection

random-forest