如何在decision_tree spec 中设置拆分规则?
How to set the splitting rule in decision_tree spec?
使用 tidymodels
元包和 decision_tree()
函数创建规范并拟合决策树时,rpart
包中分类数据的默认拆分 method/rule 是 Gini index,用rpart::rpart()
.
的params参数设置
此外,使用 ranger
引擎创建随机森林模型对分类数据使用相同的默认值。我的问题是:如何将拆分方法更改为信息增益或香农熵?
这是一个示例(关注 str()
调用和 formas_forest_fit
对象以查看拆分规则)
# install.packages(c("tidymodels", "rpart", "ranger"))
library(tidymodels)
formas <- tibble(
Color = c("Rojo", "Azul", "Rojo", "Verde", "Rojo", "Verde"),
Forma = c("Cuadrado", "Cuadrado", "Redondo", "Cuadrado", "Redondo", "Cuadrado"),
`Tamaño` = c("Grande", "Grande", "Pequeño", "Pequeño", "Grande", "Grande"),
Compra = structure(c(2L, 2L, 1L, 1L, 2L, 1L), .Label = c("No", "Si"), class = "factor")
)
# Tree spec and fit -----------------------
formas_tree_spec <-
decision_tree(min_n = 2) %>%
set_mode("classification") %>%
set_engine("rpart")
formas_tree_fit <-
fit(
formas_tree_spec,
data = formas,
formula = Compra ~ .
)
# Forest spec and fit ----------------------
formas_forest_spec <-
rand_forest(trees = 5000, min_n = 2) %>%
set_mode("classification") %>%
set_engine("ranger")
formas_forest_fit <-
fit(
formas_forest_spec,
data = formas,
formula = Compra ~ .
)
str(rpart::rpart)
str(ranger::ranger)
formas_forest_fit
继Emil Hvidfeldt's suggestion之后,set_engine()
函数接受我们直接向引擎函数传递参数。
这是具有信息增益分裂规则的树:
formas_tree_spec <-
decision_tree(min_n = 2) %>%
set_mode("classification") %>%
set_engine("rpart", parms = list(split = "information")
使用 tidymodels
元包和 decision_tree()
函数创建规范并拟合决策树时,rpart
包中分类数据的默认拆分 method/rule 是 Gini index,用rpart::rpart()
.
此外,使用 ranger
引擎创建随机森林模型对分类数据使用相同的默认值。我的问题是:如何将拆分方法更改为信息增益或香农熵?
这是一个示例(关注 str()
调用和 formas_forest_fit
对象以查看拆分规则)
# install.packages(c("tidymodels", "rpart", "ranger"))
library(tidymodels)
formas <- tibble(
Color = c("Rojo", "Azul", "Rojo", "Verde", "Rojo", "Verde"),
Forma = c("Cuadrado", "Cuadrado", "Redondo", "Cuadrado", "Redondo", "Cuadrado"),
`Tamaño` = c("Grande", "Grande", "Pequeño", "Pequeño", "Grande", "Grande"),
Compra = structure(c(2L, 2L, 1L, 1L, 2L, 1L), .Label = c("No", "Si"), class = "factor")
)
# Tree spec and fit -----------------------
formas_tree_spec <-
decision_tree(min_n = 2) %>%
set_mode("classification") %>%
set_engine("rpart")
formas_tree_fit <-
fit(
formas_tree_spec,
data = formas,
formula = Compra ~ .
)
# Forest spec and fit ----------------------
formas_forest_spec <-
rand_forest(trees = 5000, min_n = 2) %>%
set_mode("classification") %>%
set_engine("ranger")
formas_forest_fit <-
fit(
formas_forest_spec,
data = formas,
formula = Compra ~ .
)
str(rpart::rpart)
str(ranger::ranger)
formas_forest_fit
继Emil Hvidfeldt's suggestion之后,set_engine()
函数接受我们直接向引擎函数传递参数。
这是具有信息增益分裂规则的树:
formas_tree_spec <-
decision_tree(min_n = 2) %>%
set_mode("classification") %>%
set_engine("rpart", parms = list(split = "information")