预测时数据和新数据参数之间的区别

difference between data and newdata arguments when predicting

在 R 中使用 predict() 函数进行预测时,我们要预测的数据的参数是 newdata = 。我的问题是,当把 data = 而不是 newdata = 时会发生什么?因为不报错,而且用newdata =

得到的rmse不一样

这是一个例子:

library(MASS)
set.seed(18)
Boston_idx = sample(1:nrow(Boston), nrow(Boston) / 2) 
Boston_train = Boston[Boston_idx,]
Boston_test  = Boston[-Boston_idx,]

library(rpart)
Boston_tree<-rpart(medv~., data=Boston_train)
tree.pred <- predict(Boston_tree, data=Boston_test)
tree.pred2 <- predict(Boston_tree, newdata=Boston_test)

rmse = function(m, o){
  sqrt(mean((m - o)^2))
}

rmse(tree.pred,Boston_test$medv)
rmse(tree.pred2,Boston_test$medv)

data是用于拟合模型的数据,newdata是用于预测的数据。 ?predict.rpart 的帮助页面说:

newdata: data frame containing the values at which predictions are required. The predictors referred to in the right side of formula(object) must be present by name in newdata. If missing, the fitted values are returned.