如何从插入符号网格搜索中选择最佳 ntree 值?
How to pick best ntree value from a caret grid search?
我已手动调整参数以找到最佳 ntree:
bestMtry <- 3
control <- trainControl(method = 'repeatedcv',
number = 10,
repeats = 3,
search = 'grid')
storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
set.seed(291)
rf.maxtrees <- train(survived ~ .,
data = trainingSet,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = control,
importance = TRUE,
nodesize = 14,
maxnodes = 24,
ntree = ntree)
key <- toString(ntree)
storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)
summary(resultsTree)
输出:
Call:
summary.resamples(object = resultsTree)
Models: 1000, 1500, 2000
Number of resamples: 30
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1000 0.7865169 0.8181818 0.8305031 0.8335064 0.8498787 0.8764045 0
1500 0.7865169 0.8181818 0.8305031 0.8319913 0.8522727 0.8764045 0
2000 0.7865169 0.8181818 0.8305031 0.8327446 0.8522727 0.8764045 0
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1000 0.2700461 0.4243663 0.4786274 0.4753027 0.5252316 0.6281808 0
1500 0.2700461 0.4218811 0.4710053 0.4705338 0.5270828 0.6281808 0
2000 0.2700461 0.4218811 0.4786274 0.4721715 0.5270828 0.6281808 0
从输出中,我可以了解到 2000 是基于准确性和 Kappa 的 ntree 的最佳值。我想动态存储 ntree (2000) 的最佳值。有没有什么办法像 best_ntree <- resultsTree.bestTune
?
您可以存储 summary() 调用的结果,例如:
bestMtry <- 3
control <- trainControl(method = 'repeatedcv',number = 5)
data = MASS::Pima.tr
storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
set.seed(291)
rf.maxtrees <- train(type ~ .,
data = data,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = control,
importance = TRUE,
nodesize = 14,
maxnodes = 24,
ntree = ntree)
key <- toString(ntree)
storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)
我们可以取平均准确度最高的那个:
res = summary(resultsTree)
res$models[which.max(res$statistics$Accuracy[,"Mean"])]
[1] "1500"
您可以将我示例中的 1500 转换为数字...
我已手动调整参数以找到最佳 ntree:
bestMtry <- 3
control <- trainControl(method = 'repeatedcv',
number = 10,
repeats = 3,
search = 'grid')
storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
set.seed(291)
rf.maxtrees <- train(survived ~ .,
data = trainingSet,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = control,
importance = TRUE,
nodesize = 14,
maxnodes = 24,
ntree = ntree)
key <- toString(ntree)
storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)
summary(resultsTree)
输出:
Call:
summary.resamples(object = resultsTree)
Models: 1000, 1500, 2000
Number of resamples: 30
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1000 0.7865169 0.8181818 0.8305031 0.8335064 0.8498787 0.8764045 0
1500 0.7865169 0.8181818 0.8305031 0.8319913 0.8522727 0.8764045 0
2000 0.7865169 0.8181818 0.8305031 0.8327446 0.8522727 0.8764045 0
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1000 0.2700461 0.4243663 0.4786274 0.4753027 0.5252316 0.6281808 0
1500 0.2700461 0.4218811 0.4710053 0.4705338 0.5270828 0.6281808 0
2000 0.2700461 0.4218811 0.4786274 0.4721715 0.5270828 0.6281808 0
从输出中,我可以了解到 2000 是基于准确性和 Kappa 的 ntree 的最佳值。我想动态存储 ntree (2000) 的最佳值。有没有什么办法像 best_ntree <- resultsTree.bestTune
?
您可以存储 summary() 调用的结果,例如:
bestMtry <- 3
control <- trainControl(method = 'repeatedcv',number = 5)
data = MASS::Pima.tr
storeMaxtrees <- list()
tuneGrid <- expand.grid(.mtry = bestMtry)
for (ntree in c(1000, 1500, 2000)) {
set.seed(291)
rf.maxtrees <- train(type ~ .,
data = data,
method = "rf",
metric = "Accuracy",
tuneGrid = tuneGrid,
trControl = control,
importance = TRUE,
nodesize = 14,
maxnodes = 24,
ntree = ntree)
key <- toString(ntree)
storeMaxtrees[[key]] <- rf.maxtrees
}
resultsTree <- resamples(storeMaxtrees)
我们可以取平均准确度最高的那个:
res = summary(resultsTree)
res$models[which.max(res$statistics$Accuracy[,"Mean"])]
[1] "1500"
您可以将我示例中的 1500 转换为数字...