R 中的随机森林交叉验证

Random Forest Crossvalidation in R

我正在 R 中研究随机森林,我想将 10 折交叉验证添加到我的模型中。但我很困在那里。 这是我的代码示例。

install.packages('randomForest')
library(randomForest)
set.seed(123)
fit <- randomForest(as.factor(sickrabbit) ~ Feature1,..., FeatureN ,data=training1, importance=TRUE,sampsize = c(200,300),ntree=500)

我在网上找到插入符号中的函数 rfcv,但我不确定它是如何工作的。任何人都可以帮助这个功能或提出一种更简单的方法来实现交叉验证。您可以使用随机森林包而不是插入符号来做到这一点吗?

您不需要交叉验证随机森林模型。您被 randomForest 软件包困住了,因为它不是为执行此操作而设计的。

这是来自 Breiman's official documentation 的片段:

In random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:

Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree.

Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was oob. The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests.