如何检测随机森林模型中的异方差性?
How to detect Heteroscedasticity in Random Foreest Model?
我在做随机森林的回归模型,想判断模型是否存在异方差?
当我开发线性模型时,我发现存在异方差,曲线如下图所示,我想检查随机森林模型的类似残差图。
我在 R 工作
It's an Expense Model basis Income,Branch,TotalFamilyMember
我们可以使用预测值的残差重新创建绘图:
#Using the regression example from ?randomForest
ozone.rf <- randomForest(Ozone ~ ., data=airq, mtry=3,
importance=TRUE)
#Find residuals by subtracting predicted from acutal values
err <- ozone.rf$predicted - airq$Ozone
#Make data frame holding residuals and fitted values
df <- data.frame(Residuals=err, Fitted.Values=ozone.rf$predicted)
#Sort data by fitted values
df2 <- df[order(df$Fitted.Values),]
#Create plot
plot(Residuals~Fitted.Values, data=df2)
#Add origin line at (0,0) with grey color #8
abline(0,0, col=8)
#Add the same smoothing line from lm regression with color red #2
lines(lowess(df2$Fitted.Values, df2$Residuals), col=2)
更新
有一种更简单的方法。我意识到该图只是残差和拟合值的回归,因此这给出了相同的输出:
fitted.values <- ozone.rf$predicted
residuals <- fitted.values - ozone.rf$y
plot(lm(residuals ~ fitted.values), which=1)
我在做随机森林的回归模型,想判断模型是否存在异方差?
当我开发线性模型时,我发现存在异方差,曲线如下图所示,我想检查随机森林模型的类似残差图。
我在 R 工作
It's an Expense Model basis Income,Branch,TotalFamilyMember
我们可以使用预测值的残差重新创建绘图:
#Using the regression example from ?randomForest
ozone.rf <- randomForest(Ozone ~ ., data=airq, mtry=3,
importance=TRUE)
#Find residuals by subtracting predicted from acutal values
err <- ozone.rf$predicted - airq$Ozone
#Make data frame holding residuals and fitted values
df <- data.frame(Residuals=err, Fitted.Values=ozone.rf$predicted)
#Sort data by fitted values
df2 <- df[order(df$Fitted.Values),]
#Create plot
plot(Residuals~Fitted.Values, data=df2)
#Add origin line at (0,0) with grey color #8
abline(0,0, col=8)
#Add the same smoothing line from lm regression with color red #2
lines(lowess(df2$Fitted.Values, df2$Residuals), col=2)
更新
有一种更简单的方法。我意识到该图只是残差和拟合值的回归,因此这给出了相同的输出:
fitted.values <- ozone.rf$predicted
residuals <- fitted.values - ozone.rf$y
plot(lm(residuals ~ fitted.values), which=1)