使用默认的'randomForest'算法进行分类时，为什么终端节点数与案例数不匹配？

Question

根据https://cran.r-project.org/web/packages/randomForest/randomForest.pdf，分类树完全生长，意味着节点大小= 1。但是，如果树真的长到最大，那么每个终端节点不应该包含一个案例（数据点、物种等）吗？如果我运行:

library(randomForest)
data(iris) #150 cases
set.seed(352)
rf <- randomForest(Species ~ ., iris)
hist(treesize(rf),main ="number of nodes")

我可以看到大多数“完全成长”的树只有大约 10 个节点，这意味着节点大小不能等于 1...对吗？

例如，下面的(-1)表示森林中第134棵树的终端节点。只有8个终端节点！？

> getTree(rf,134)
   left daughter right daughter split var split point status prediction
1              2              3         3        2.50      1          0
2              0              0         0        0.00     -1          1
3              4              5         4        1.75      1          0
4              6              7         3        4.95      1          0
5              8              9         3        4.85      1          0
6             10             11         4        1.60      1          0
7             12             13         1        6.50      1          0
8             14             15         1        5.95      1          0
9              0              0         0        0.00     -1          3
10             0              0         0        0.00     -1          2
11             0              0         0        0.00     -1          3
12             0              0         0        0.00     -1          3
13             0              0         0        0.00     -1          2
14             0              0         0        0.00     -1          2
15             0              0         0        0.00     -1          3

如果有人能解释一下，我将不胜感激

Answer 1

“完全成熟”->“没有什么可以分裂的了”。如果分配给它的所有数据记录hold/make相同的预测，则（a-的节点）决策树完全增长。

在鸢尾花数据集的情况下，一旦您到达一个包含 50 个 setosa 数据记录的节点，将其拆分为两个分别具有 25 和 25 个 setosas 的子节点是没有意义的。

使用默认的'randomForest'算法进行分类时，为什么终端节点数与案例数不匹配？

when using the default 'randomForest' algorithm for classification, why doesn't the number of terminal nodes match the number of cases?

decision-tree

random-forest