随机森林：处理 R 中因子水平的错误

Question

我在 R 中使用 rf 模型来预测二进制结果 0 或 1。我的输入数据中有分类变量（编码为数字），这些变量在训练时被编码为因子。我在 R 中使用 factor() 函数将变量转换为因子。所以对于每个分类变量x，我的代码是这样的。

feature_x1=factor(feature_x1) # Convert the variable into factor in training data. 
#This variable takes 3 levels 0,1,2

这在训练模型时非常有效。让我们假设我的模型对象是 rf_model。而运行新数据模型只是一个数字向量。我首先将数字转换为 feature_x1

的因子

newdata=data.frame(1,2)
colnames(newdata)=c("feature_x1","feature_x2")
newdata$feature_x1=factor(newdata$feature_x1)
score=pred(rf_model,newdata,type="prob")

我收到以下错误

Error in predict.randomForest(rf_model, newdata,type = "prob") : New factor levels not present in the training data

如何处理这个错误，因为在现实中，在训练模型之后，我们总是要处理结果未知的数据，这只是一条记录。

如果需要更清晰的说明或代码，请告诉我

Answer 1

尝试

newdata$feature_x1 <- factor(newdata$feature_x1, levels=levels(feature_x1))

随机森林：处理 R 中因子水平的错误

random forest: error in dealing with factor levels in R

r

prediction

random-forest

r-factor