获得 R 中连续变量的随机森林预测精度

Question

我正在尝试使用随机森林预测 R 中的连续变量（计数）。预测变量的值为 min=1 和 max=1000.

我尝试通过 "confusionMatrix" 获得预测精度，但自然而然地我得到了预测和预测之间不同级别数的误差。

在这些情况下获得预测准确性的最佳方法是什么？

Answer 1

randomForest 应该只显示分类结果的混淆矩阵，所以尽量确保你的结果是数字的。然后这将显示均方残差。例如：

library(randomForest)
# This is probably what you're seeing
randomForest(as.factor(Sepal.Length) ~ Sepal.Width, data=iris)
# This is what you want to see
randomForest(Sepal.Length ~ Sepal.Width, data=iris)

Answer 2

@mishakob

粗略地说，均方根误差可以理解为实际值和拟合值之间的归一化偏差。可以通过以下方式获取。

library(randomForest)
set.seed(1237)
iris.rg <- randomForest(Sepal.Length ~ ., data=iris, importance=TRUE,
                        proximity=TRUE)

sqrt(sum((iris.rg$predicted - iris$Sepal.Length)^2) / nrow(iris))
[1] 0.3706187

获得 R 中连续变量的随机森林预测精度

Getting random forest prediction accuracy for a continuous variable in R

r

machine-learning

random-forest