从组合的 randomForest 回归对象计算 R 平方(%Var 解释)
Calculate R-squared (%Var explained) from combined randomForest regression object
计算 randomForest
回归时,对象包括 R 平方作为“% Var explained: ...
”。
library(randomForest)
library(doSNOW)
library(foreach)
library(ggplot2)
dat <- data.frame(ggplot2::diamonds[1:1000,1:7])
rf <- randomForest(formula = carat ~ ., data = dat, ntree = 500)
rf
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = 500)
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
#
# Mean of squared residuals: 0.001820046
# % Var explained: 95.22
但是,当使用 foreach
循环计算 combine
多个 randomForest
对象时,R 平方值不可用,如 [=17= 中所述]:
The confusion
, err.rate
, mse
and rsq
components (as well as the corresponding components in the test compnent, if exist) of the combined object will be NULL
cl <- makeCluster(8)
registerDoSNOW(cl)
rfPar <- foreach(ntree=rep(63,8),
.combine = combine,
.multicombine = T,
.packages = "randomForest") %dopar%
{
randomForest(formula = carat ~ ., data = dat, ntree = ntree)
}
stopCluster(cl)
rfPar
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = ntree)
# Type of random forest: regression
# Number of trees: 504
# No. of variables tried at each split: 2
因为在 this question 中没有真正回答:是否有可能在之后从 randomForest
对象计算 R 平方(% Var 解释)和平方残差的平均值?
(这种并行化的批评者可能会争辩说使用 caret::train(... method = "parRF")
或其他方法。然而,这会花费很长时间。事实上,这可能对任何使用 combine
合并的人有用randomForest
个对象...)
是的。您可以在事后计算 R 平方值,方法是将训练数据和训练模型得出的预测结果与实际值进行比较:
# taking the object from the question:
actual <- dat$carat
predicted <- unname(predict(rfPar, dat))
R2 <- 1 - (sum((actual-predicted)^2)/sum((actual-mean(actual))^2))
或均方误差:
caret::RMSE(predicted,actual)
计算 randomForest
回归时,对象包括 R 平方作为“% Var explained: ...
”。
library(randomForest)
library(doSNOW)
library(foreach)
library(ggplot2)
dat <- data.frame(ggplot2::diamonds[1:1000,1:7])
rf <- randomForest(formula = carat ~ ., data = dat, ntree = 500)
rf
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = 500)
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
#
# Mean of squared residuals: 0.001820046
# % Var explained: 95.22
但是,当使用 foreach
循环计算 combine
多个 randomForest
对象时,R 平方值不可用,如 [=17= 中所述]:
The
confusion
,err.rate
,mse
andrsq
components (as well as the corresponding components in the test compnent, if exist) of the combined object will beNULL
cl <- makeCluster(8)
registerDoSNOW(cl)
rfPar <- foreach(ntree=rep(63,8),
.combine = combine,
.multicombine = T,
.packages = "randomForest") %dopar%
{
randomForest(formula = carat ~ ., data = dat, ntree = ntree)
}
stopCluster(cl)
rfPar
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = ntree)
# Type of random forest: regression
# Number of trees: 504
# No. of variables tried at each split: 2
因为在 this question 中没有真正回答:是否有可能在之后从 randomForest
对象计算 R 平方(% Var 解释)和平方残差的平均值?
(这种并行化的批评者可能会争辩说使用 caret::train(... method = "parRF")
或其他方法。然而,这会花费很长时间。事实上,这可能对任何使用 combine
合并的人有用randomForest
个对象...)
是的。您可以在事后计算 R 平方值,方法是将训练数据和训练模型得出的预测结果与实际值进行比较:
# taking the object from the question:
actual <- dat$carat
predicted <- unname(predict(rfPar, dat))
R2 <- 1 - (sum((actual-predicted)^2)/sum((actual-mean(actual))^2))
或均方误差:
caret::RMSE(predicted,actual)