混淆混淆矩阵参数改变输出

Confusing confusion matrix parameters changing output

我有 运行 预测 运行dom 森林模型。当我 运行 下面的代码时,我得到了两个不同的混淆矩阵——唯一的区别是一个我在预测函数中使用 data = train ,另一个我只使用 'train'。为什么这会造成如此大的不同——一个人的召回率要差得多。

conf.matrix <- table(train$Status,predict(fit2,train))

               Pred:Churn Pred:Current
  Actual:Churn         2543          984
  Actual:Current         44        27206

conf.matrix <- table(train$Status,predict(fit2,data = train))

                Pred:Churn Pred:Current
  Actual:Churn         1609         1918
  Actual:Current        464        26786

非常感谢。

第二个示例中的 data 参数被忽略,因为正确的参数名称是 newdata 正如@mtoto 和@agenis 所指出的。在没有 newdata 的情况下,predict.randomForest 将 return 模型的 袋外 预测。

这就是你想要做的。

来自 post CrossValidated:

Be aware that there's a difference between

predict(model)

and

predict(model, newdata=train)

when getting predictions for the training dataset. The first option gets the out-of-bag predictions from the random forest. This is generally what you want, when comparing predicted values to actuals on the training data.

The second treats your training data as if it was a new dataset, and runs the observations down each tree. This will result in an artificially close correlation between the predictions and the actuals, since the RF algorithm generally doesn't prune the individual trees, relying instead on the ensemble of trees to control overfitting. So don't do this if you want to get predictions on the training data.