RWeka 不适用于插入符号或可能 %dopar%

RWeka will not work with caret or possibly %dopar%

我正在完成 Applied Predictive Modeling 的练习,caret 包的 R 教科书,由作者编写。我无法让 train 函数使用方法 M5PM5Rules.

代码将 运行 手动运行:

data("permeability")
trainIndex <- createDataPartition(permeability[, 1], p = 0.75, 
                              list = FALSE)
fingerNZV <- nearZeroVar(fingerprints, saveMetrics = TRUE)
trainY <- permeability[trainIndex, 1]
testY <- permeability[-trainIndex, 1]
trainX <- fingerprints[trainIndex, !fingerNZV$nzv]
testX <- fingerprints[-trainIndex, !fingerNZV$nzv]
indx <- createFolds(trainY, k = 10, returnTrain = TRUE)
ctrl <- trainControl('cv', index = indx)

m5Tuner <- t(as.matrix(expand.grid(
  N = c(1, 0),
  U = c(1, 0),
  M = floor(seq(4, 15, length.out = 3))
)))
startTime <- Sys.time()
m5Tune <- foreach(tuner = m5Tuner) %do% {
  m5ctrl <- Weka_control(M = tuner[3],
                       N = tuner[1] == 1,
                       U = tuner[2] == 1)
  mods <- lapply(ctrl$index,function(fold) {
    d <- cbind(data.frame(permeability = trainY[fold]),
               trainX[fold, ])
    mod <- M5P(permeability ~ ., d, control = m5ctrl)
    rmse <- RMSE(predict(mod, as.data.frame(trainX[-fold, ])), 
                 trainY[-fold])
    list(model = mod, rmse = rmse)
  })
  mean_rmse <- mean(sapply(mods, '[[', 'rmse'))
  list(models = mods, mean_rmse = mean_rmse)
}
endTime <- Sys.time()
endTime - startTime
# Time difference of 59.17742 secs

相同的数据和控件(将 'rules' 替换为 'M' -为什么我不能将 M 指定为调整参数?)将无法完成:

m5Tuner <- expand.grid(
  pruned = c("Yes", "No"),
  smoothed = c("Yes", "No"),
  rules = c("Yes", "No")
)
m5Tune <- train(trainX, trainY,
                method = 'M5',
                trControl = ctrl,
                tuneGrid = m5Tuner,
                control = Weka_control(M = 10))

书上的例子也说不完:

library(caret)
data(solubility)
set.seed(100)
indx <- createFolds(solTrainY, returnTrain = TRUE)
ctrl <- trainControl(method = "cv", index = indx)

set.seed(100)
m5Tune <- train(x = solTrainXtrans, y = solTrainY,
                method = "M5",
                trControl = ctrl,
                control = Weka_control(M = 10))

这可能是与 RWeka 一起使用并行后端的问题,至少对我来说是这样。我上面的例子不会以 %dopar%.

结束

我在每个例子之前都有 运行 sudo R CMD javareconf 并重新启动了 Rstudio。

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] APMBook_0.0.0.9000              RWeka_0.4-27                   
[3] caret_6.0-68                    ggplot2_2.1.0                  
[5] lattice_0.20-30                 AppliedPredictiveModeling_1.1-6
# dozens others loaded via namespace.

trainRWeka 模型中使用并行处理时,您应该得到错误:

In train.default(trainX, trainY, method = "M5", trControl = ctrl,  :
 Models using Weka will not work with parallel processing with multicore/doMC

Weka 的 java 接口不适用于多个 worker。

这需要一段时间,但如果您没有在 foreach

注册工作人员,train 调用将会完成

最大