在 r 中将 chaid 回归树转换为 table
chaid regression tree to table conversion in r
我使用了 this link 的 CHAID 包。它给了我一个可以绘制的 chaid 对象。我想要一个决策 table,每个决策规则都在一列中,而不是一个决策树。 .但是我不明白如何访问这个chaid对象中的节点和路径..请帮助我..
我遵循了 this link
中给出的程序
我不能 post 我的数据在这里,因为它太 long.So 我正在 post 编写一个代码,该代码采用 chaid 提供的示例数据集来执行任务。
从 chaid 的帮助手册中复制:
library("CHAID")
### fit tree to subsample
set.seed(290875)
USvoteS <- USvote[sample(1:nrow(USvote), 1000),]
ctrl <- chaid_control(minsplit = 200, minprob = 0.1)
chaidUS <- chaid(vote3 ~ ., data = USvoteS, control = ctrl)
print(chaidUS)
plot(chaidUS)
输出:
Model formula:
vote3 ~ gender + ager + empstat + educr + marstat
Fitted party:
[1] root
| [2] marstat in married
| | [3] educr <HS, HS, >HS: Gore (n = 311, err = 49.5%)
| | [4] educr in College, Post Coll: Bush (n = 249, err = 35.3%)
| [5] marstat in widowed, divorced, never married
| | [6] gender in male: Gore (n = 159, err = 47.8%)
| | [7] gender in female
| | | [8] ager in 18-24, 25-34, 35-44, 45-54: Gore (n = 127, err = 22.0%)
| | | [9] ager in 55-64, 65+: Gore (n = 115, err = 40.9%)
Number of inner nodes: 4
Number of terminal nodes: 5
所以我的问题是如何在决策 table 中使用列中的每个决策规则 (branch/path) 获取此树数据。我不明白如何从中访问不同的树路径柴对象..
CHAID 包使用 partykit(递归分区)树结构。您可以使用参与方节点遍历树 - 一个节点可以是终端节点,也可以有一个节点列表,其中包含有关决策规则(拆分)和拟合数据的信息。
下面的代码遍历树并创建决策 table。它是为演示目的而编写的,并且仅在一棵示例树上进行了测试。
tree2table <- function(party_tree) {
df_list <- list()
var_names <- attr( party_tree$terms, "term.labels")
var_levels <- lapply( party_tree$data, levels)
walk_the_tree <- function(node, rule_branch = NULL) {
# depth-first walk on partynode structure (recursive function)
# decision rules are extracted for every branch
if(missing(rule_branch)) {
rule_branch <- setNames(data.frame(t(replicate(length(var_names), NA))), var_names)
rule_branch <- cbind(rule_branch, nodeId = NA)
rule_branch <- cbind(rule_branch, predict = NA)
}
if(is.terminal(node)) {
rule_branch[["nodeId"]] <- node$id
rule_branch[["predict"]] <- predict_party(party_tree, node$id)
df_list[[as.character(node$id)]] <<- rule_branch
} else {
for(i in 1:length(node)) {
rule_branch1 <- rule_branch
val1 <- decision_rule(node,i)
rule_branch1[[names(val1)[1]]] <- val1
walk_the_tree(node[i], rule_branch1)
}
}
}
decision_rule <- function(node, i) {
# returns split decision rule in data.frame with variable name an values
var_name <- var_names[node$split$varid[[1]]]
values_vec <- var_levels[[var_name]][ node$split$index == i]
values_txt <- paste(values_vec, collapse = ", ")
return( setNames(values_txt, var_name))
}
# compile data frame list
walk_the_tree(party_tree$node)
# merge all dataframes
res_table <- Reduce(rbind, df_list)
return(res_table)
}
使用 CHAID 树对象调用函数:
table1 <- tree2table(chaidUS)
结果应该是这样的:
gender ager empstat educr marstat nodeId predict
-------- -------------------------- --------- ------------------ -------------------------------- -------- ---------
NA NA NA <HS, HS, >HS married 3 Gore
NA NA NA College, Post Coll married 4 Bush
male NA NA NA widowed, divorced, never married 6 Gore
female 18-24, 25-34, 35-44, 45-54 NA NA widowed, divorced, never married 8 Gore
female 55-64, 65+ NA NA widowed, divorced, never married 9 Gore
首先感谢这个出色的功能。
从我这边稍微修改一下,而不是 predict_party(party_tree, node$id),以获得预测的 class 概率,尝试 predict_party(party_tree, node$id, type = 'prob')。另外要获得特定的 class 概率,请使用 predict_party(party_tree, node$id, type = 'prob')[1]或 predict_party(party_tree, node$id, type = 'prob')[2].
我使用了 this link 的 CHAID 包。它给了我一个可以绘制的 chaid 对象。我想要一个决策 table,每个决策规则都在一列中,而不是一个决策树。 .但是我不明白如何访问这个chaid对象中的节点和路径..请帮助我.. 我遵循了 this link
中给出的程序我不能 post 我的数据在这里,因为它太 long.So 我正在 post 编写一个代码,该代码采用 chaid 提供的示例数据集来执行任务。
从 chaid 的帮助手册中复制:
library("CHAID")
### fit tree to subsample
set.seed(290875)
USvoteS <- USvote[sample(1:nrow(USvote), 1000),]
ctrl <- chaid_control(minsplit = 200, minprob = 0.1)
chaidUS <- chaid(vote3 ~ ., data = USvoteS, control = ctrl)
print(chaidUS)
plot(chaidUS)
输出:
Model formula:
vote3 ~ gender + ager + empstat + educr + marstat
Fitted party:
[1] root
| [2] marstat in married
| | [3] educr <HS, HS, >HS: Gore (n = 311, err = 49.5%)
| | [4] educr in College, Post Coll: Bush (n = 249, err = 35.3%)
| [5] marstat in widowed, divorced, never married
| | [6] gender in male: Gore (n = 159, err = 47.8%)
| | [7] gender in female
| | | [8] ager in 18-24, 25-34, 35-44, 45-54: Gore (n = 127, err = 22.0%)
| | | [9] ager in 55-64, 65+: Gore (n = 115, err = 40.9%)
Number of inner nodes: 4
Number of terminal nodes: 5
所以我的问题是如何在决策 table 中使用列中的每个决策规则 (branch/path) 获取此树数据。我不明白如何从中访问不同的树路径柴对象..
CHAID 包使用 partykit(递归分区)树结构。您可以使用参与方节点遍历树 - 一个节点可以是终端节点,也可以有一个节点列表,其中包含有关决策规则(拆分)和拟合数据的信息。
下面的代码遍历树并创建决策 table。它是为演示目的而编写的,并且仅在一棵示例树上进行了测试。
tree2table <- function(party_tree) {
df_list <- list()
var_names <- attr( party_tree$terms, "term.labels")
var_levels <- lapply( party_tree$data, levels)
walk_the_tree <- function(node, rule_branch = NULL) {
# depth-first walk on partynode structure (recursive function)
# decision rules are extracted for every branch
if(missing(rule_branch)) {
rule_branch <- setNames(data.frame(t(replicate(length(var_names), NA))), var_names)
rule_branch <- cbind(rule_branch, nodeId = NA)
rule_branch <- cbind(rule_branch, predict = NA)
}
if(is.terminal(node)) {
rule_branch[["nodeId"]] <- node$id
rule_branch[["predict"]] <- predict_party(party_tree, node$id)
df_list[[as.character(node$id)]] <<- rule_branch
} else {
for(i in 1:length(node)) {
rule_branch1 <- rule_branch
val1 <- decision_rule(node,i)
rule_branch1[[names(val1)[1]]] <- val1
walk_the_tree(node[i], rule_branch1)
}
}
}
decision_rule <- function(node, i) {
# returns split decision rule in data.frame with variable name an values
var_name <- var_names[node$split$varid[[1]]]
values_vec <- var_levels[[var_name]][ node$split$index == i]
values_txt <- paste(values_vec, collapse = ", ")
return( setNames(values_txt, var_name))
}
# compile data frame list
walk_the_tree(party_tree$node)
# merge all dataframes
res_table <- Reduce(rbind, df_list)
return(res_table)
}
使用 CHAID 树对象调用函数:
table1 <- tree2table(chaidUS)
结果应该是这样的:
gender ager empstat educr marstat nodeId predict
-------- -------------------------- --------- ------------------ -------------------------------- -------- ---------
NA NA NA <HS, HS, >HS married 3 Gore
NA NA NA College, Post Coll married 4 Bush
male NA NA NA widowed, divorced, never married 6 Gore
female 18-24, 25-34, 35-44, 45-54 NA NA widowed, divorced, never married 8 Gore
female 55-64, 65+ NA NA widowed, divorced, never married 9 Gore
首先感谢这个出色的功能。 从我这边稍微修改一下,而不是 predict_party(party_tree, node$id),以获得预测的 class 概率,尝试 predict_party(party_tree, node$id, type = 'prob')。另外要获得特定的 class 概率,请使用 predict_party(party_tree, node$id, type = 'prob')[1]或 predict_party(party_tree, node$id, type = 'prob')[2].