prune.tree 中的错误无法修剪 R.tree 中的单节点树
Error in prune.tree can not prune singlenode tree in R.tree
原始数据集为7499 obs。 19 个变量。我在 R 中使用 tree
包来构建分类树。结果似乎很合理,情节成功如下所示:
library(tree)
tree.data = tree(Y~., data.train, control = tree.control(dim(data)[1], mincut = 10, minsize = 20, mindev = 0.001))
plot(tree.data)
text(tree.data, pretty = 0,cex=0.6)
但是,当我尝试使用cv.tree
修剪树时,出现错误。
cv.data = cv.tree(tree.data, FUN = prune.misclass)
Error in prune.tree(tree = list(frame = list(var = 1L, n = 6732, dev = 9089.97487458261, :
can not prune singlenode tree
然后我检查 tree.data
结构。
summary(tree.data)
Classification tree:
tree(formula = Y ~ ., data = data.train, control = tree.control(dim(data)[1],
mincut = 10, minsize = 20, mindev = 0.001))
Variables actually used in tree construction:
[1] "X2" "X1" "X6" "X13" "X5" "X10" "X14" "X16" "X17" "X3" "X7" "X15" "X11" "X18"
[15] "X8" "X12"
Number of terminal nodes: 45
Residual mean deviance: 1.24 = 9243 / 7454
Misclassification error rate: 0.3475 = 2606 / 7499
这不是单节点树。所以我很困惑为什么会出现这个错误?
当树被完全修剪并且只剩下根节点时,cv.tree
会产生此错误。在生成一组与 Y
无关的 X
变量时,我可以重现您的错误。
library(tree)
# Data generating process
# Y is NOT associated to any X variables
set.seed(1234)
X <- matrix(rnorm(7499*18), ncol=18)
Y <- rbinom(7499, 1, 0.5)
data <- data.frame(Y=factor(Y, labels=c("No","Yes")), X)
idx <- sample(1:nrow(data), 6000)
data.train <- data[idx,]
# Train the tree
tree.data = tree(Y~., data.train,
control=tree.control(dim(data)[1], mincut = 10, minsize = 20, mindev = 0.001))
plot(tree.data)
text(tree.data, pretty = 0,cex=0.6)
# Pruning by cv.tree
cv.data = cv.tree(tree.data, FUN = prune.misclass)
错误信息是:
Error in prune.tree(tree = list(frame = list(var = 1L, n = 4842, dev =
6712.03745626047, : can not prune singlenode tree
现在假设 X1 关联到 Y。
# Data generating process
set.seed(1234)
X <- matrix(rnorm(7499*18), ncol=18)
Y <- X[,1]>0 + rbinom(7499, 1, 0.2)
data <- data.frame(Y=factor(Y, labels=c("No","Yes")), X)
idx <- sample(1:nrow(data), 6000)
data.train <- data[idx,]
cv.tree
命令现在不会抛出错误:
# Pruning by cv.tree
cv.data = cv.tree(tree.data, FUN = prune.misclass)
pruned.tree <- prune.tree(tree.data, k=cv.data$k[3])
plot(pruned.tree)
text(pruned.tree, pretty = 0, cex=0.6)
原始数据集为7499 obs。 19 个变量。我在 R 中使用 tree
包来构建分类树。结果似乎很合理,情节成功如下所示:
library(tree)
tree.data = tree(Y~., data.train, control = tree.control(dim(data)[1], mincut = 10, minsize = 20, mindev = 0.001))
plot(tree.data)
text(tree.data, pretty = 0,cex=0.6)
但是,当我尝试使用cv.tree
修剪树时,出现错误。
cv.data = cv.tree(tree.data, FUN = prune.misclass)
Error in prune.tree(tree = list(frame = list(var = 1L, n = 6732, dev = 9089.97487458261, :
can not prune singlenode tree
然后我检查 tree.data
结构。
summary(tree.data)
Classification tree:
tree(formula = Y ~ ., data = data.train, control = tree.control(dim(data)[1],
mincut = 10, minsize = 20, mindev = 0.001))
Variables actually used in tree construction:
[1] "X2" "X1" "X6" "X13" "X5" "X10" "X14" "X16" "X17" "X3" "X7" "X15" "X11" "X18"
[15] "X8" "X12"
Number of terminal nodes: 45
Residual mean deviance: 1.24 = 9243 / 7454
Misclassification error rate: 0.3475 = 2606 / 7499
这不是单节点树。所以我很困惑为什么会出现这个错误?
当树被完全修剪并且只剩下根节点时,cv.tree
会产生此错误。在生成一组与 Y
无关的 X
变量时,我可以重现您的错误。
library(tree)
# Data generating process
# Y is NOT associated to any X variables
set.seed(1234)
X <- matrix(rnorm(7499*18), ncol=18)
Y <- rbinom(7499, 1, 0.5)
data <- data.frame(Y=factor(Y, labels=c("No","Yes")), X)
idx <- sample(1:nrow(data), 6000)
data.train <- data[idx,]
# Train the tree
tree.data = tree(Y~., data.train,
control=tree.control(dim(data)[1], mincut = 10, minsize = 20, mindev = 0.001))
plot(tree.data)
text(tree.data, pretty = 0,cex=0.6)
# Pruning by cv.tree
cv.data = cv.tree(tree.data, FUN = prune.misclass)
错误信息是:
Error in prune.tree(tree = list(frame = list(var = 1L, n = 4842, dev = 6712.03745626047, : can not prune singlenode tree
现在假设 X1 关联到 Y。
# Data generating process
set.seed(1234)
X <- matrix(rnorm(7499*18), ncol=18)
Y <- X[,1]>0 + rbinom(7499, 1, 0.2)
data <- data.frame(Y=factor(Y, labels=c("No","Yes")), X)
idx <- sample(1:nrow(data), 6000)
data.train <- data[idx,]
cv.tree
命令现在不会抛出错误:
# Pruning by cv.tree
cv.data = cv.tree(tree.data, FUN = prune.misclass)
pruned.tree <- prune.tree(tree.data, k=cv.data$k[3])
plot(pruned.tree)
text(pruned.tree, pretty = 0, cex=0.6)