使用 rpart 的预测方法计算树的预测精度

Question

我已经使用 rpart 为数据集构建了一个决策树。

然后我将数据分为两部分 - 训练数据集和测试数据集。使用训练数据为数据集构建了一棵树。我想根据创建的模型计算预测的准确性。

我的代码如下所示：

library(rpart)
#reading the data
data = read.table("source")
names(data) <- c("a", "b", "c", "d", "class")

#generating test and train data - Data selected randomly with a 80/20 split
trainIndex  <- sample(1:nrow(x), 0.8 * nrow(x))
train <- data[trainIndex,]
test <- data[-trainIndex,]

#tree construction based on information gain
tree = rpart(class ~ a + b + c + d, data = train, method = 'class', parms = list(split = "information"))

我现在想通过将结果与实际值训练和测试数据进行比较来计算模型生成的预测的准确性，但是我在这样做时遇到了错误。

我的代码如下所示：

t_pred = predict(tree,test,type="class")
t = test['class']
accuracy = sum(t_pred == t)/length(t)
print(accuracy)

我收到一条错误消息，指出 -

Error in t_pred == t : comparison of these types is not implemented In addition: Warning message: Incompatible methods ("Ops.factor", "Ops.data.frame") for "=="

在检查 t_pred 的类型时，我发现它是整数类型，但是文档

(https://stat.ethz.ch/R-manual/R-devel/library/rpart/html/predict.rpart.html)

声明 predict() 方法必须 return 一个向量。

我无法理解为什么变量的类型是整数而不是列表。我在哪里犯了错误，我该如何改正？

Answer 1

先尝试计算混淆矩阵：

confMat <- table(test$class,t_pred)

现在您可以通过将矩阵的对角线和（即正确预测）除以矩阵的总和来计算准确度：

accuracy <- sum(diag(confMat))/sum(confMat)

Answer 2

我的回复与@mtoto 的非常相似，但更简单一些...希望它也能有所帮助。

mean(test$class == t_pred)

使用 rpart 的预测方法计算树的预测精度

Calculating prediction accuracy of a tree using rpart's predict method

r

machine-learning

decision-tree

rpart