Prp 图 - 以不同方式为正值和负值着色
Prp plot - Coloring positive and negative values differently
我正在通过函数 rpart()
拟合回归树。根据我的数据,我将对节点进行正负估计。有没有办法给它们涂上不同的颜色?
特别是,我想要的是一棵树,其节点以蓝色阴影表示负值,以红色阴影表示正值,其中颜色越深表示绝对值越强。
我附上一个最小的可重现示例。
library(rpart)
library(rpart.plot)
# Simulating data.
set.seed(1986)
X = matrix(rnorm(2000, 0, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
# Fitting regression tree.
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# Plotting.
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
# Repeating, with median(y) != 0.
X = matrix(rnorm(2000, 5, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# HERE I NEED HELP!
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
据我了解,由于box.palette
选项,我在第一个设置中获得了我需要的结果,因为median(y)
接近于零。
的确,在第二个设置中我很不高兴:对于小于 median(y)
的值,我得到蓝色阴影,对于高于该值的值,我得到红色阴影。如何将零作为两种颜色的阈值?
更具体地说,我想要一个自动确保任何树中双色系统的命令。
好的,我回答了我自己的问题。解决方案实际上很简单:如果 box.palette
选项是双色发散调色板(如我的示例),我们可以使用 pal.thresh
来设置我们想要的阈值。就我而言:
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3,
pal.thresh = 0) # HERE THE SOLUTION!
即使这可能对我不利,我也会在这里为未来的用户留下答案并关闭问题,而不是删除它。
我正在通过函数 rpart()
拟合回归树。根据我的数据,我将对节点进行正负估计。有没有办法给它们涂上不同的颜色?
特别是,我想要的是一棵树,其节点以蓝色阴影表示负值,以红色阴影表示正值,其中颜色越深表示绝对值越强。
我附上一个最小的可重现示例。
library(rpart)
library(rpart.plot)
# Simulating data.
set.seed(1986)
X = matrix(rnorm(2000, 0, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
# Fitting regression tree.
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# Plotting.
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
# Repeating, with median(y) != 0.
X = matrix(rnorm(2000, 5, 1), nrow = 1000, ncol = 2)
epsilon = matrix(rnorm(1000, 0, 0.01), nrow = 1000)
y = X[, 1] + X[, 2] + epsilon
dta = data.frame(X, y)
my.tree = rpart(y ~ X1 + X2, data = dta, method = "anova", maxdepth = 3)
# HERE I NEED HELP!
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3)
据我了解,由于box.palette
选项,我在第一个设置中获得了我需要的结果,因为median(y)
接近于零。
的确,在第二个设置中我很不高兴:对于小于 median(y)
的值,我得到蓝色阴影,对于高于该值的值,我得到红色阴影。如何将零作为两种颜色的阈值?
更具体地说,我想要一个自动确保任何树中双色系统的命令。
好的,我回答了我自己的问题。解决方案实际上很简单:如果 box.palette
选项是双色发散调色板(如我的示例),我们可以使用 pal.thresh
来设置我们想要的阈值。就我而言:
prp(my.tree,
type = 2,
clip.right.labs = FALSE,
extra = 101,
under = FALSE,
under.cex = 1,
fallen.leaves = TRUE,
box.palette = "BuRd",
branch = 1,
round = 0,
leaf.round = 0,
prefix = "" ,
main = "",
cex.main = 1.5,
branch.col = "gray",
branch.lwd = 3,
pal.thresh = 0) # HERE THE SOLUTION!
即使这可能对我不利,我也会在这里为未来的用户留下答案并关闭问题,而不是删除它。