如何运行 predict.boosting获取新数据？

Question

我正在尝试使用 predict.boosting 获取 adabag 包中的新数据。我找不到一种方法将它用于没有标签的数据（或该包中的任何其他功能）。

我正在尝试：

pr <- predict.boosting(modelfit, test[,2:ncol(test)])

它给出：

Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) : 
  undefined columns selected

但是，如果我包含标签：

pr <- predict.boosting(modelfit, test)

它工作得很好。但是必须有一种方法可以将其用作无标签数据的预测模型。

感谢您的帮助！

编辑包示例：

library(rusboost)
library(rpart)
data(iris)

通过删除大部分 setosa 观察使其成为不平衡的数据集

df <- iris[41:150,]

创建二进制变量

df$Setosa <- factor(ifelse(df$Species == "setosa", "setosa", "notsetosa"))

创建反例索引

idx <- df$Setosa == "notsetosa"

运行型号

test.rusboost <- rusb(Setosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                      data = df, boot = F, iters = 20, sampleFraction = .1, idx = idx)

predict.boosting(test.rusboost, df)
predict.boosting(test.rusboost, df[,1:4)

Answer 1

您应该控制 train 中的所有列（您用于训练模型的集合）都出现在 test 中并且具有相同的名称。

请检查：

all(colnames(train) %in% colnames(test))

如果为假，您将需要控制构建训练和测试的方式。

如果是真的，并且大体上，请提供一个可重现的例子。

编辑：

控制列相同且它们包含相同因子的一个好方法是使用 dataPreparation 包中的 sameShape。如果不是 cas，它会添加级别和列（并警告您）。

使用方法：

library(dataPreparation)
test <- sameShape(test, train)

Answer 2

我想出了一个解决方法，我将一个与标签同名的列附加到我的新数据中，并用随机因子水平填充它。

df$Setosa <- factor(sample( c("setosa",  "notsetosa"), nrow(df), replace=TRUE, prob=c(0.5, 0.5) ))

然后就可以正常工作了。

如何运行 predict.boosting获取新数据？

How to run predict.boosting for new data?

r

prediction

adaboost