结合 DF 和 rpart$where?

Combining DF and rpart$where?

如果我在使用 DF 作为我的数据拟合 rpart 对象后执行 DF$where <- tree$where,每一行是否会通过列 where 映射到其对应的叶子?

谢谢!

作为如何证明这可能是正确的示例(模数我对您的问题的理解是正确的),我们使用 ?rpart 中的第一个示例:

require(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
kyphosis$where <- fit$where

> str(kyphosis)
'data.frame':   81 obs. of  5 variables:
 $ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
 $ Age     : int  71 158 128 2 1 1 61 37 113 59 ...
 $ Number  : int  3 3 4 5 4 2 2 3 2 6 ...
 $ Start   : int  5 14 5 1 15 16 17 16 16 12 ...
 $ where   : int  9 7 9 9 3 3 3 3 3 8 ...

> plot(fit)
> text(fit, use.n = TRUE)

现在查看一些基于 'where' 向量的表格和一些逻辑测试:

第一个节点:

> with(kyphosis, table(where, Start >= 8.5)) 


where FALSE TRUE
    3     0   29
    5     0   12
    7     0   14
    8     0    7
    9    19    0  # so this is the row that describes that split
> fit$frame[9,]
     var  n wt dev yval complexity ncompete nsurrogate   yval2.V1
3 <leaf> 19 19   8    2       0.01        0          0  2.0000000
    yval2.V2   yval2.V3   yval2.V4   yval2.V5 yval2.nodeprob
3  8.0000000 11.0000000  0.4210526  0.5789474      0.2345679

第二个节点:

> with(kyphosis, table(where, Start >= 8.5, Start>=14.5))
, ,  = FALSE


where FALSE TRUE
    3     0    0
    5     0   12
    7     0   14
    8     0    7
    9    19    0

, ,  = TRUE


where FALSE TRUE
    3     0   29
    5     0    0
    7     0    0
    8     0    0
    9     0    0

这是描述第二次拆分的 fit$frame 行:

> fit$frame[3,]
     var  n wt dev yval complexity ncompete nsurrogate   yval2.V1
4 <leaf> 29 29   0    1       0.01        0          0  1.0000000
    yval2.V2   yval2.V3   yval2.V4   yval2.V5 yval2.nodeprob
4 29.0000000  0.0000000  1.0000000  0.0000000      0.3580247

因此,我会将 fit$where 的值描述为描述 "terminal nodes",而这些 "terminal nodes" 被标记为 '<leaf>',这可能是也可能不是您所说的 "nodes".

> fit$frame
      var  n wt dev yval complexity ncompete nsurrogate    yval2.V1
1   Start 81 81  17    1 0.17647059        2          1  1.00000000
2   Start 62 62   6    1 0.01960784        2          2  1.00000000
4  <leaf> 29 29   0    1 0.01000000        0          0  1.00000000
5     Age 33 33   6    1 0.01960784        2          2  1.00000000
10 <leaf> 12 12   0    1 0.01000000        0          0  1.00000000
11    Age 21 21   6    1 0.01960784        2          0  1.00000000
22 <leaf> 14 14   2    1 0.01000000        0          0  1.00000000
23 <leaf>  7  7   3    2 0.01000000        0          0  2.00000000
3  <leaf> 19 19   8    2 0.01000000        0          0  2.00000000
      yval2.V2    yval2.V3    yval2.V4    yval2.V5 yval2.nodeprob
1  64.00000000 17.00000000  0.79012346  0.20987654     1.00000000
2  56.00000000  6.00000000  0.90322581  0.09677419     0.76543210
4  29.00000000  0.00000000  1.00000000  0.00000000     0.35802469
5  27.00000000  6.00000000  0.81818182  0.18181818     0.40740741
10 12.00000000  0.00000000  1.00000000  0.00000000     0.14814815
11 15.00000000  6.00000000  0.71428571  0.28571429     0.25925926
22 12.00000000  2.00000000  0.85714286  0.14285714     0.17283951
23  3.00000000  4.00000000  0.42857143  0.57142857     0.08641975
3   8.00000000 11.00000000  0.42105263  0.57894737     0.23456790