将 C.50 与成本一起使用时预测错误并使用 type="prob" 进行预测以在 R 中绘制 ROC 曲线

Question

我正在为我已经实现的一系列分类器绘制 ROC 曲线。问题是，当我有一个带有成本矩阵的 C.50 分类器（我正在使用 RStudio）时，我会收到下一条错误消息。

Error in predict.C5.0(classifier.cost.1, data, type="prob"): confidence values (i.e. class probabilities) should not be used with costs.

分类器很好，当我在预测命令中不使用 type="prob" 时，它也能正常工作，但我无法绘制 ROC 曲线。

这是我用来创建自己的 ROC 曲线的代码：

pred.class.cost <- predict(classifier.cost.1, data, type="prob")
perf.class.cost <- performance(prediction(pred.class.cost[,2], data$class),"tpr","fpr")
ROC.class.cost <- data.frame(x=perf.class.cost@x.values[[1]],y=perf.class.cost@y.values[[1]])

这里有两个问题：

错误是什么意思，我该如何解决？
如果无法修复它，还有其他方法可以创建我自己的 ROC 曲线吗？（然后我使用 ggplot2 获取所有 ROC 曲线并将它们绘制在一起。

如有任何帮助，我们将不胜感激。谢谢！

Answer 1

C5.0 documentation 的预测部分解释说：

When the cost argument is used in the main function, class probabilities derived from the class distribution in the terminal nodes may not be consistent with the final predicted class. For this reason, requesting class probabilities from a model using unequal costs will throw an error.

为了解决这个问题，假设你想给积极因素更多的权重 class，那么你可以对积极因素进行过采样或对消极因素进行欠采样（我更喜欢后者）。这将具有与应用成本类似的效果，然后将允许您获得概率并生成 ROC 曲线。

将 C.50 与成本一起使用时预测错误并使用 type="prob" 进行预测以在 R 中绘制 ROC 曲线

Error in predict when using C.50 with costs and predict with type="prob" to draw ROC curves in R

r

decision-tree

roc