R 中决策树中的节点 - 需要更多节点

Nodes in Decision Tree in R - more nodes needed

我在 R 中创建了一个决策树。当我绘制它时,我只有 3 个节点(1 个根节点和 2 个终端节点)。我用来创建决策树的公式是

 >FertilityTree <- rpart(Output~ Age + Surgery + RDrugs + SpermCount +      Smoker, data = FertilityTree, method = "class")

生成的图形是: http://rpubs.com/BonitaWilliams/fertilitydecisiontree

你能帮我画一张显示更多节点的图吗?或者告诉我为什么我的节点这么少?

可以提供一些参数来为可能增加(或减少)节点数的分裂算法提供更多信息。如果您有一个特定的结果,预计错误会导致更高的成本,那么您需要为其分配一个不同的 loss。没有示例数据,就没有经过测试的示例代码,但是 'loss' 规范的形式应该是这样的:

FertilityTree <- rpart(Output~ Age + Surgery + RDrugs + SpermCount +
                          Smoker, 
                       data = FertilityTree, method = "class", 
                       parms=list(loss=matrix(c(0,1,2,0),2) )

我不能保证它会一直可用到 SO(或宇宙,以先到者为准)结束,但目前你可以找到 discussion and worked example here. The loss matrix needs to have positive off-axis and zero on-axis elements. There are other parameters that may be adjusted as well, including adjustments to 'prior' and changing the splitting criterion to "information". See the parms section of ?rpart for the requirements on these options. There is also the possibility of changing the splitting criterion to a user supplied version. Terry Therneau posted this in 2011 on Rhelp.

也可以使用 rpart.control,一个以该名称作为参数提供的列表。您可以尝试 rpart.control=list(minsplit=10) 作为允许进一步拆分的简单第一步。更改复杂性参数 cp 也可能会产生一些影响。这是帮助页面中的 Usage 部分::

?rpart.control
rpart.control(minsplit = 20, minbucket = round(minsplit/3), cp = 0.01, 
              maxcompete = 4, maxsurrogate = 5, usesurrogate = 2, xval = 10,
              surrogatestyle = 0, maxdepth = 30, ...)