"ROC" 指标不在结果集中
"ROC" metric not in result set
我正在尝试使用 caret 包生成随机森林模型,使用 ROC 曲线下的面积作为训练指标,但我收到以下警告:
Warning message:
In train.default(x = TrainData, y = TrainClasses, method = "rf", :
The metric "ROC" was not in the result set. Accuracy will be used instead.
显然这不是我想要的,但我不知道我哪里出错了。
这是一个可重现的例子:
library(caret)
library(doParallel)
library(data.table)
cl <- makeCluster(detectCores() - 1) # I'm using 3 cores.
registerDoParallel(cl)
data(iris)
iris <- iris[iris$Species != 'virginica',] # to get two categories
TrainData <- as.data.table(iris[,1:4]) # My data is a data.table.
TrainClasses <- as.factor(as.character(iris[,5])) # to reset the levels to the two remaining flower types.
ctrl <- trainControl(method = 'oob',
classProbs = TRUE,
verboseIter = TRUE,
summaryFunction = twoClassSummary,
allowParallel = TRUE)
model.fit <- train(x = TrainData,
y = TrainClasses,
method = 'rf',
metric = 'ROC',
tuneLength = 3,
trControl = ctrl)
如果我不创建并行集群并设置allowParallel = FALSE
,结果是一样的。
如果它有用,这里是 sessionInfo()
调用的结果:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-10 data.table_1.9.6 doParallel_1.0.10 iterators_1.0.7 foreach_1.4.3
[6] caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 compiler_3.2.2 nloptr_1.0.4 plyr_1.8.3 tools_3.2.2
[6] digest_0.6.8 lme4_1.1-8 nlme_3.1-121 gtable_0.1.2 mgcv_1.8-7
[11] Matrix_1.2-2 brglm_0.5-9 SparseM_1.6 proto_0.3-10 BradleyTerry2_1.0-6
[16] stringr_1.0.0 gtools_3.5.0 stats4_3.2.2 grid_3.2.2 nnet_7.3-10
[21] minqa_1.2.4 reshape2_1.4.1 car_2.0-26 magrittr_1.5 scales_0.3.0
[26] codetools_0.2-14 MASS_7.3-44 splines_3.2.2 pbkrtest_0.4-2 colorspace_1.2-6
[31] quantreg_5.11 stringi_0.5-5 munsell_0.4.2 chron_2.3-47
谢谢。期待修复此问题!
你是对的。当您选择 method = "oob"
时,AUC-ROC 不是返回的指标之一。
您需要稍微研究一下源代码,找出指标的计算位置。它由第 19 行的 oobTrainWorkflow
调用的 method$oob
计算,然后由第 258 行的 train.default
调用。在您的情况下,method
是 models$rf
,其中对象 models
从名为 models.RData
的外部包文件加载:
load(system.file("models", "models.RData", package = "caret"))
您可以检查 oob
方法 models$rf
(与 method
相同):
function(x) {
out <- switch(x$type,
regression = c(sqrt(max(x$mse[length(x$mse)], 0)), x$rsq[length(x$rsq)]),
classification = c(1 - x$err.rate[x$ntree, "OOB"],
e1071::classAgreement(x$confusion[,-dim(x$confusion)[2]])[["kappa"]]))
names(out) <- if(x$type == "regression") c("RMSE", "Rsquared") else c("Accuracy", "Kappa")
out
}
您可以看到,当请求分类 RF 时,仅计算准确性和 kappa 指标。
您可以调整 method$oob
以使用 method$prob(mod$fit)
并计算 AUC-ROC。
我正在尝试使用 caret 包生成随机森林模型,使用 ROC 曲线下的面积作为训练指标,但我收到以下警告:
Warning message:
In train.default(x = TrainData, y = TrainClasses, method = "rf", :
The metric "ROC" was not in the result set. Accuracy will be used instead.
显然这不是我想要的,但我不知道我哪里出错了。
这是一个可重现的例子:
library(caret)
library(doParallel)
library(data.table)
cl <- makeCluster(detectCores() - 1) # I'm using 3 cores.
registerDoParallel(cl)
data(iris)
iris <- iris[iris$Species != 'virginica',] # to get two categories
TrainData <- as.data.table(iris[,1:4]) # My data is a data.table.
TrainClasses <- as.factor(as.character(iris[,5])) # to reset the levels to the two remaining flower types.
ctrl <- trainControl(method = 'oob',
classProbs = TRUE,
verboseIter = TRUE,
summaryFunction = twoClassSummary,
allowParallel = TRUE)
model.fit <- train(x = TrainData,
y = TrainClasses,
method = 'rf',
metric = 'ROC',
tuneLength = 3,
trControl = ctrl)
如果我不创建并行集群并设置allowParallel = FALSE
,结果是一样的。
如果它有用,这里是 sessionInfo()
调用的结果:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-10 data.table_1.9.6 doParallel_1.0.10 iterators_1.0.7 foreach_1.4.3
[6] caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.1 compiler_3.2.2 nloptr_1.0.4 plyr_1.8.3 tools_3.2.2
[6] digest_0.6.8 lme4_1.1-8 nlme_3.1-121 gtable_0.1.2 mgcv_1.8-7
[11] Matrix_1.2-2 brglm_0.5-9 SparseM_1.6 proto_0.3-10 BradleyTerry2_1.0-6
[16] stringr_1.0.0 gtools_3.5.0 stats4_3.2.2 grid_3.2.2 nnet_7.3-10
[21] minqa_1.2.4 reshape2_1.4.1 car_2.0-26 magrittr_1.5 scales_0.3.0
[26] codetools_0.2-14 MASS_7.3-44 splines_3.2.2 pbkrtest_0.4-2 colorspace_1.2-6
[31] quantreg_5.11 stringi_0.5-5 munsell_0.4.2 chron_2.3-47
谢谢。期待修复此问题!
你是对的。当您选择 method = "oob"
时,AUC-ROC 不是返回的指标之一。
您需要稍微研究一下源代码,找出指标的计算位置。它由第 19 行的 oobTrainWorkflow
调用的 method$oob
计算,然后由第 258 行的 train.default
调用。在您的情况下,method
是 models$rf
,其中对象 models
从名为 models.RData
的外部包文件加载:
load(system.file("models", "models.RData", package = "caret"))
您可以检查 oob
方法 models$rf
(与 method
相同):
function(x) {
out <- switch(x$type,
regression = c(sqrt(max(x$mse[length(x$mse)], 0)), x$rsq[length(x$rsq)]),
classification = c(1 - x$err.rate[x$ntree, "OOB"],
e1071::classAgreement(x$confusion[,-dim(x$confusion)[2]])[["kappa"]]))
names(out) <- if(x$type == "regression") c("RMSE", "Rsquared") else c("Accuracy", "Kappa")
out
}
您可以看到,当请求分类 RF 时,仅计算准确性和 kappa 指标。
您可以调整 method$oob
以使用 method$prob(mod$fit)
并计算 AUC-ROC。