在 SHAPforxgboost 图中自定义标签
Customizing labels in SHAPforxgboost plots
我正在创建一些 SHAP 分数图,用于可视化我使用 xgboost 创建的模型。我使用了运行良好的 SHAPforxgboost 包,现在我想在我正在编写的文本文档中使用这些数字(尤其是来自 shap.plot.summary() 的数字)。然而,labels/titles 在 x 轴和 y 轴上的字体大小非常小,我想知道是否有办法让它们更大更易读。
我使用了与此处所示非常相似的设置; https://www.rdocumentation.org/packages/SHAPforxgboost/versions/0.0.2 :
library("SHAPforxgboost")
y_var <- "diffcwv"
dataX <- dataXY_df[,-..y_var]
# hyperparameter tuning results
param_dart <- list(objective = "reg:linear", # For regression
nrounds = 366,
eta = 0.018,
max_depth = 10,
gamma = 0.009,
subsample = 0.98,
colsample_bytree = 0.86)
mod <- xgboost::xgboost(data = as.matrix(dataX), label = as.matrix(dataXY_df[[y_var]]),
xgb_param = param_dart, nrounds = param_dart$nrounds,
verbose = FALSE, nthread = parallel::detectCores() - 2,
early_stopping_rounds = 8)
# To return the SHAP values and ranked features by mean|SHAP|
shap_values <- shap.values(xgb_model = mod, X_train = dataX)
# The ranked features by mean |SHAP|
shap_values$mean_shap_score
# To prepare the long-format data:
shap_long <- shap.prep(xgb_model = mod, X_train = dataX)
# is the same as: using given shap_contrib
shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = dataX)
# (Notice that there will be a data.table warning from `melt.data.table` due to `dayint` coerced from integer to double)
# **SHAP summary plot**
shap.plot.summary(shap_long)
shap.plot.summary() 的输出是:
更具体地说,我有兴趣增加 y 轴上每个描述符的字体大小
查看代码 here 因为它是用 ggplot 制作的,你应该能够覆盖默认标签大小参数。
使用shap.plot.summary.wrap2
函数的例子:
library("SHAPforxgboost")
library("ggplot2")
data("iris")
X1 = as.matrix(iris[,-5])
mod1 = xgboost::xgboost(
data = X1, label = iris$Species, gamma = 0, eta = 1,
lambda = 0,nrounds = 1, verbose = FALSE)
# shap.values(model, X_dataset) returns the SHAP
# data matrix and ranked features by mean|SHAP|
shap_values <- shap.values(xgb_model = mod1, X_train = X1)
shap_values$mean_shap_score
#> Petal.Length Petal.Width Sepal.Length Sepal.Width
#> 0.62935975 0.21664035 0.02910357 0.00000000
shap_values_iris <- shap_values$shap_score
# shap.prep() returns the long-format SHAP data from either model or
shap_long_iris <- shap.prep(xgb_model = mod1, X_train = X1)
# is the same as: using given shap_contrib
shap_long_iris <- shap.prep(shap_contrib = shap_values_iris, X_train = X1)
# **SHAP summary plot**
# shap.plot.summary(shap_long_iris, scientific = TRUE)
# shap.plot.summary(shap_long_iris, x_bound = 1.5, dilute = 10)
# Alternatives options to make the same plot:
# option 1: from the xgboost model
# shap.plot.summary.wrap1(mod1, X = as.matrix(iris[,-5]), top_n = 3)
# option 2: supply a self-made SHAP values dataset
# (e.g. sometimes as output from cross-validation)
shap.plot.summary.wrap2(shap_values_iris, X1, top_n = 3) +
ggplot2::theme(axis.text.y = element_text(size = 20))
因此,由于 cbo 对于大多数情况发布了足够的答案,我无法编辑 y 轴上标签的大小(即 0.629、0.219、0.029)。我发现最好的解决方案是使用函数
shap.plot.summary <- edit(shap.plot.summary)
编辑 ggplot 设置。对于任何好奇的人,我发现与情节相关的 ggplot 设置是:
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(), legend.position = "bottom",
legend.title = element_text(size = 25),
legend.text = element_text(size = 25),
axis.title.x = element_text(size = 25),
axis.text.y = element_text(size = 40),
axis.text.x.bottom = element_text(size = 20))
我正在创建一些 SHAP 分数图,用于可视化我使用 xgboost 创建的模型。我使用了运行良好的 SHAPforxgboost 包,现在我想在我正在编写的文本文档中使用这些数字(尤其是来自 shap.plot.summary() 的数字)。然而,labels/titles 在 x 轴和 y 轴上的字体大小非常小,我想知道是否有办法让它们更大更易读。
我使用了与此处所示非常相似的设置; https://www.rdocumentation.org/packages/SHAPforxgboost/versions/0.0.2 :
library("SHAPforxgboost")
y_var <- "diffcwv"
dataX <- dataXY_df[,-..y_var]
# hyperparameter tuning results
param_dart <- list(objective = "reg:linear", # For regression
nrounds = 366,
eta = 0.018,
max_depth = 10,
gamma = 0.009,
subsample = 0.98,
colsample_bytree = 0.86)
mod <- xgboost::xgboost(data = as.matrix(dataX), label = as.matrix(dataXY_df[[y_var]]),
xgb_param = param_dart, nrounds = param_dart$nrounds,
verbose = FALSE, nthread = parallel::detectCores() - 2,
early_stopping_rounds = 8)
# To return the SHAP values and ranked features by mean|SHAP|
shap_values <- shap.values(xgb_model = mod, X_train = dataX)
# The ranked features by mean |SHAP|
shap_values$mean_shap_score
# To prepare the long-format data:
shap_long <- shap.prep(xgb_model = mod, X_train = dataX)
# is the same as: using given shap_contrib
shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = dataX)
# (Notice that there will be a data.table warning from `melt.data.table` due to `dayint` coerced from integer to double)
# **SHAP summary plot**
shap.plot.summary(shap_long)
shap.plot.summary() 的输出是:
更具体地说,我有兴趣增加 y 轴上每个描述符的字体大小
查看代码 here 因为它是用 ggplot 制作的,你应该能够覆盖默认标签大小参数。
使用shap.plot.summary.wrap2
函数的例子:
library("SHAPforxgboost")
library("ggplot2")
data("iris")
X1 = as.matrix(iris[,-5])
mod1 = xgboost::xgboost(
data = X1, label = iris$Species, gamma = 0, eta = 1,
lambda = 0,nrounds = 1, verbose = FALSE)
# shap.values(model, X_dataset) returns the SHAP
# data matrix and ranked features by mean|SHAP|
shap_values <- shap.values(xgb_model = mod1, X_train = X1)
shap_values$mean_shap_score
#> Petal.Length Petal.Width Sepal.Length Sepal.Width
#> 0.62935975 0.21664035 0.02910357 0.00000000
shap_values_iris <- shap_values$shap_score
# shap.prep() returns the long-format SHAP data from either model or
shap_long_iris <- shap.prep(xgb_model = mod1, X_train = X1)
# is the same as: using given shap_contrib
shap_long_iris <- shap.prep(shap_contrib = shap_values_iris, X_train = X1)
# **SHAP summary plot**
# shap.plot.summary(shap_long_iris, scientific = TRUE)
# shap.plot.summary(shap_long_iris, x_bound = 1.5, dilute = 10)
# Alternatives options to make the same plot:
# option 1: from the xgboost model
# shap.plot.summary.wrap1(mod1, X = as.matrix(iris[,-5]), top_n = 3)
# option 2: supply a self-made SHAP values dataset
# (e.g. sometimes as output from cross-validation)
shap.plot.summary.wrap2(shap_values_iris, X1, top_n = 3) +
ggplot2::theme(axis.text.y = element_text(size = 20))
因此,由于 cbo 对于大多数情况发布了足够的答案,我无法编辑 y 轴上标签的大小(即 0.629、0.219、0.029)。我发现最好的解决方案是使用函数
shap.plot.summary <- edit(shap.plot.summary)
编辑 ggplot 设置。对于任何好奇的人,我发现与情节相关的 ggplot 设置是:
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(), legend.position = "bottom",
legend.title = element_text(size = 25),
legend.text = element_text(size = 25),
axis.title.x = element_text(size = 25),
axis.text.y = element_text(size = 40),
axis.text.x.bottom = element_text(size = 20))