如何在 R 中直观地表示具有多个变量的意外事件 table 作为决策树?
How can I visually represent a contingency table with multiple variables as a decision tree in R?
比如我有一个受访者,问s/he是否有病。从那里我问 her/his 父亲是否得过这种病。如果对后一个问题是肯定的,那么我问父亲现在是否已经痊愈。如果父亲没有患病,则该问题不适用。
我可以在 R 或其他地方创建这样的 "decision tree" 吗?
这里是可用数据,其中0表示"no",1表示"yes":
person_disease <- c(rep(1, 10), rep(0, 20))
father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5) )
##
df <- data.frame(person_disease, father_disease, father_cured)
确实有一个包可以用来创建这种类型的图形,它被称为图表,很方便。
它不是像 barplot()
或 qplot()
那样的自动制图过程,但您可以使用它来制作您想要制作的那种图表。
如果您遵守纪律,您可以编写代码使该过程针对您的特定数据和情况更加自动化。
包名为,图。您可以在此 pdf 中找到更多相关信息。
您可以为此使用 data.tree 包。有很多方法可以做你想做的事。例如:
person_disease <- c(rep(1, 10), rep(0, 20))
father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5) )
df <- data.frame(person_disease, father_disease, father_cured)
library(data.tree)
#here, the tree is constructed "manually"
#however, depending on your data and your needs, you might want to generate the tree directly from the data
#many examples for this are available in the vignettes, see browseVignettes("data.tree")
disease <- Node$new("Disease", data = df)
father_disease_yes <- disease$AddChild("Father Disease Yes", label = "Father Disease", edge = "yes", condition = function(df) df[df$person_disease == 1,])
father_cured_yes <- father_disease_yes$AddChild("Father Cured Yes", label = "Father Cured", edge = "yes", condition = function(df) df[df$father_cured == 1,])
father_disease_no <- disease$AddChild("Father Disease No", label = "Father Disease", edge = "no", condition = function(df) df[df$person_disease == 0,])
#data filter (pre-order)
#an alternative would be to do this recursively
disease$Do(function(node) {
for (child in node$children) {
child$data <- child$condition(node$data)
}
})
print(disease, total = function(node) nrow(node$data))
#plotting
#(many more options are available, see ?plot.Node)
SetEdgeStyle(disease,
fontname = "helvetica",
arrowhead = "none",
label = function(node) paste0(node$edge, "\n", "total = ", nrow(node$data)))
SetNodeStyle(disease,
fontname = "helvetica",
label = function(node) node$label)
plot(disease)
比如我有一个受访者,问s/he是否有病。从那里我问 her/his 父亲是否得过这种病。如果对后一个问题是肯定的,那么我问父亲现在是否已经痊愈。如果父亲没有患病,则该问题不适用。
我可以在 R 或其他地方创建这样的 "decision tree" 吗?
这里是可用数据,其中0表示"no",1表示"yes":
person_disease <- c(rep(1, 10), rep(0, 20))
father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5) )
##
df <- data.frame(person_disease, father_disease, father_cured)
确实有一个包可以用来创建这种类型的图形,它被称为图表,很方便。
它不是像 barplot()
或 qplot()
那样的自动制图过程,但您可以使用它来制作您想要制作的那种图表。
如果您遵守纪律,您可以编写代码使该过程针对您的特定数据和情况更加自动化。
包名为,图。您可以在此 pdf 中找到更多相关信息。
您可以为此使用 data.tree 包。有很多方法可以做你想做的事。例如:
person_disease <- c(rep(1, 10), rep(0, 20))
father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5) )
df <- data.frame(person_disease, father_disease, father_cured)
library(data.tree)
#here, the tree is constructed "manually"
#however, depending on your data and your needs, you might want to generate the tree directly from the data
#many examples for this are available in the vignettes, see browseVignettes("data.tree")
disease <- Node$new("Disease", data = df)
father_disease_yes <- disease$AddChild("Father Disease Yes", label = "Father Disease", edge = "yes", condition = function(df) df[df$person_disease == 1,])
father_cured_yes <- father_disease_yes$AddChild("Father Cured Yes", label = "Father Cured", edge = "yes", condition = function(df) df[df$father_cured == 1,])
father_disease_no <- disease$AddChild("Father Disease No", label = "Father Disease", edge = "no", condition = function(df) df[df$person_disease == 0,])
#data filter (pre-order)
#an alternative would be to do this recursively
disease$Do(function(node) {
for (child in node$children) {
child$data <- child$condition(node$data)
}
})
print(disease, total = function(node) nrow(node$data))
#plotting
#(many more options are available, see ?plot.Node)
SetEdgeStyle(disease,
fontname = "helvetica",
arrowhead = "none",
label = function(node) paste0(node$edge, "\n", "total = ", nrow(node$data)))
SetNodeStyle(disease,
fontname = "helvetica",
label = function(node) node$label)
plot(disease)