TuneRanger 中的重复 CV
Repeated CV in TuneRanger
我正在使用“TuneRanger”包来调整 RF 模型。它效果很好,我获得了很好的结果,但我不确定它是否过度拟合了我的模型。我想为包正在调整模型的每个实例使用 Repeated CV,但我找不到方法来做到这一点。另外我想知道是否有人知道这个包是如何验证每次尝试的结果的(train-test,cv,repeated cv?)我一直在阅读包的说明(https://cran.r-project.org/web/packages/tuneRanger/tuneRanger.pdf)但它什么也没说它。
感谢您的帮助。
包外估计用于估计误差,我认为您不能使用该包切换到 CV。 CV 是否比这更好由您决定。在他们的 readme, they linked to a publication 中,在第 3.5 节下,他们写道:
Out-of-bag predictions are used for evaluation, which makes it much
faster than other packages that use evaluation strategies such as
cross-validation
如果要使用cross-validation或重复cross-validation,则必须使用caret
,例如:
library(caret)
mdl = train(Species ~ .,data=iris,method="ranger",trControl=trainControl(method="repeatedcv",repeats=2),
tuneGrid = expand.grid(mtry=2:3,min.node.size = 1:2,splitrule="gini"))
Random Forest
150 samples
4 predictor
3 classes: 'setosa', 'versicolor', 'virginica'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times)
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ...
Resampling results across tuning parameters:
mtry min.node.size Accuracy Kappa
2 1 0.96 0.94
2 2 0.96 0.94
3 1 0.96 0.94
3 2 0.96 0.94
Tuning parameter 'splitrule' was held constant at a value of gini
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini
and min.node.size = 1.
您可以调整的参数会有所不同。我认为 mlr
也允许您执行 cross-validation 但同样的限制适用。
我正在使用“TuneRanger”包来调整 RF 模型。它效果很好,我获得了很好的结果,但我不确定它是否过度拟合了我的模型。我想为包正在调整模型的每个实例使用 Repeated CV,但我找不到方法来做到这一点。另外我想知道是否有人知道这个包是如何验证每次尝试的结果的(train-test,cv,repeated cv?)我一直在阅读包的说明(https://cran.r-project.org/web/packages/tuneRanger/tuneRanger.pdf)但它什么也没说它。
感谢您的帮助。
包外估计用于估计误差,我认为您不能使用该包切换到 CV。 CV 是否比这更好由您决定。在他们的 readme, they linked to a publication 中,在第 3.5 节下,他们写道:
Out-of-bag predictions are used for evaluation, which makes it much faster than other packages that use evaluation strategies such as cross-validation
如果要使用cross-validation或重复cross-validation,则必须使用caret
,例如:
library(caret)
mdl = train(Species ~ .,data=iris,method="ranger",trControl=trainControl(method="repeatedcv",repeats=2),
tuneGrid = expand.grid(mtry=2:3,min.node.size = 1:2,splitrule="gini"))
Random Forest
150 samples
4 predictor
3 classes: 'setosa', 'versicolor', 'virginica'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 2 times)
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ...
Resampling results across tuning parameters:
mtry min.node.size Accuracy Kappa
2 1 0.96 0.94
2 2 0.96 0.94
3 1 0.96 0.94
3 2 0.96 0.94
Tuning parameter 'splitrule' was held constant at a value of gini
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini
and min.node.size = 1.
您可以调整的参数会有所不同。我认为 mlr
也允许您执行 cross-validation 但同样的限制适用。