预测时数据和新数据参数之间的区别
difference between data and newdata arguments when predicting
在 R 中使用 predict()
函数进行预测时,我们要预测的数据的参数是 newdata =
。我的问题是,当把 data =
而不是 newdata =
时会发生什么?因为不报错,而且用newdata =
得到的rmse不一样
这是一个例子:
library(MASS)
set.seed(18)
Boston_idx = sample(1:nrow(Boston), nrow(Boston) / 2)
Boston_train = Boston[Boston_idx,]
Boston_test = Boston[-Boston_idx,]
library(rpart)
Boston_tree<-rpart(medv~., data=Boston_train)
tree.pred <- predict(Boston_tree, data=Boston_test)
tree.pred2 <- predict(Boston_tree, newdata=Boston_test)
rmse = function(m, o){
sqrt(mean((m - o)^2))
}
rmse(tree.pred,Boston_test$medv)
rmse(tree.pred2,Boston_test$medv)
data
是用于拟合模型的数据,newdata
是用于预测的数据。 ?predict.rpart
的帮助页面说:
newdata
: data frame containing the values at which predictions are required. The predictors referred to in the right side of formula(object)
must be present by name in newdata
. If missing, the fitted values are returned.
在 R 中使用 predict()
函数进行预测时,我们要预测的数据的参数是 newdata =
。我的问题是,当把 data =
而不是 newdata =
时会发生什么?因为不报错,而且用newdata =
这是一个例子:
library(MASS)
set.seed(18)
Boston_idx = sample(1:nrow(Boston), nrow(Boston) / 2)
Boston_train = Boston[Boston_idx,]
Boston_test = Boston[-Boston_idx,]
library(rpart)
Boston_tree<-rpart(medv~., data=Boston_train)
tree.pred <- predict(Boston_tree, data=Boston_test)
tree.pred2 <- predict(Boston_tree, newdata=Boston_test)
rmse = function(m, o){
sqrt(mean((m - o)^2))
}
rmse(tree.pred,Boston_test$medv)
rmse(tree.pred2,Boston_test$medv)
data
是用于拟合模型的数据,newdata
是用于预测的数据。 ?predict.rpart
的帮助页面说:
newdata
: data frame containing the values at which predictions are required. The predictors referred to in the right side offormula(object)
must be present by name innewdata
. If missing, the fitted values are returned.