有没有一种巧妙的方法可以用 geom_quantile() 中的方程和其他统计数据来标记 ggplot 图?

Is there a neat approach to label a ggplot plot with the equation and other statistics from geom_quantile()?

我想以类似于 geom_smooth(method="lm") 拟合线性回归(我之前使用 ggpmiscawesome)。例如,这段代码:

# quantile regression example with ggpmisc equation
# basic quantile code from here:
# https://ggplot2.tidyverse.org/reference/geom_quantile.html

library(tidyverse)
library(ggpmisc)
# see ggpmisc vignette for stat_poly_eq() code below:
# https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html#stat_poly_eq

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# linear ols regression with equation labelled
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

m + 
  geom_smooth(method = "lm", formula = my_formula) +
  stat_poly_eq(aes(label =  paste(stat(eq.label), "*\" with \"*", 
                                  stat(rr.label), "*\", \"*", 
                                  stat(f.value.label), "*\", and \"*",
                                  stat(p.value.label), "*\".\"",
                                  sep = "")),
               formula = my_formula, parse = TRUE, size = 3)  

生成这个:

对于分位数回归,您可以将 geom_smooth() 换成 geom_quantile() 并绘制一条可爱的分位数回归线(在本例中为中位数):

# quantile regression - no equation labelling
m + 
  geom_quantile(quantiles = 0.5)
  

您如何将摘要统计信息输出到标签中,或者如何在旅途中重新创建它们? (即除了在调用 ggplot 之前进行回归,然后将其传递给然后注释(例如,类似于线性回归所做的 or

@mark-neal stat_fit_glance() 确实适用于 quantreg::rq()。然而,使用 stat_fit_glance() 涉及更多。此统计数据不“知道”从 glance() 中得到什么,因此必须手动 assemble 和 label

需要知道对此有什么可用的。可以 运行 在 ggplot 之外拟合模型并使用 glance() 找出它 return 的列,或者可以在包 [=67= 的帮助下在 ggplot 中执行此操作].我将展示这个替代方案,从上面的代码示例继续。

library(gginnards)

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), geom = "debug")

geom_debug() 默认情况下只是将它的输入打印到 R 控制台,它的输入就是统计 returns.

# A tibble: 1 x 11
   npcx  npcy   tau logLik    AIC    BIC df.residual     x      y PANEL group
  <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>       <int> <dbl>  <dbl> <fct> <int>
1    NA    NA   0.5   816. -1628. -1621.         232  1.87 0.0803 1        -1

我们可以使用 after_stat()(早期版本为 stat() 并包含名称 ...)访问每一列。我们需要使用 ... 的编码符号进行格式化=26=]。如果像本例我们assemble一个需要解析成表达式的字符串,那么还需要parse = TRUE

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), 
                  mapping = aes(label = sprintf('italic(tau)~"="~%.2f~~AIC~"="~%.3g~~BIC~"="~%.3g',
                                                after_stat(tau), after_stat(AIC), after_stat(BIC))),
                  parse = TRUE)

此示例生成以下图。

对于 stat_fit_tidy(),同样的方法应该有效。但是,在 'ggpmisc' (<= 0.3.7) 中,它使用“lm”而不是“rq”。此错误已在 'ggpmisc' (>= 0.3.8) 中修复,现在在 CRAN 中。

下面的示例仅适用于 'ggpmisc' (>= 0.3.8)

剩下的问题是glance()tidy()return中的tibble是否包含了想要添加到剧情中的信息,这似乎不是tidy.qr() 的情况,至少在默认情况下是这样。但是,tidy.rq() 有一个参数 se.type,用于确定 return 在 tibble 中编辑的值。修订后的 stat_fit_tidy() 接受要传递给 tidy() 的命名参数,使以下成为可能。

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_tidy(method = "rq",
                method.args = list(formula = y ~ x), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE)

此示例生成以下图。

定义一个新的统计数据 stat_rq_eq() 会使这更简单:

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                ...)
}

答案变为:

m + 
  geom_quantile(quantiles = 0.5) +
  stat_rq_eqn(tau = 0.5)

请将此视为 Pedro 出色回答的附录,他在其中完成了大部分繁重的工作 - 这增加了一些演示调整(颜色和线型)和代码以简化多个分位数,生成如下图:

library(tidyverse)
library(ggpmisc) #ensure version 0.3.8 or greater
library(quantreg)
library(generics)

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# base plot
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

# function for labelling
# Doesn't neatly handle P values (e.g return "P<0.001 where appropriate)

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, colour = "red", label.y = 0.9, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('italic(tau)~"="~%.3f~";"~y~"="~%.3g~+~%.3g~x*", with "~italic(P)~"="~%.3g',
                                              after_stat(x_tau),
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                colour = colour,
                label.y = label.y,
                ...)
}

# This works, though with double entry of plot specs
# custom colours and linetype
# 
# 


m + 
  geom_quantile(quantiles = c(0.1, 0.5, 0.9), 
                aes(colour = as.factor(..quantile..),
                    linetype = as.factor(..quantile..))
                )+
  scale_color_manual(values = c("red","purple","darkgreen"))+
  scale_linetype_manual(values = c("dotted", "dashed", "solid"))+
  stat_rq_eqn(tau = 0.1, colour = "red", label.y = 0.9)+
  stat_rq_eqn(tau = 0.5, colour = "purple", label.y = 0.95)+
  stat_rq_eqn(tau = 0.9, colour = "darkgreen", label.y = 1.0)+
  theme(legend.position = "none") # suppress legend


# not a good habit to have double entry above
# modified with reference to tibble for plot specs, 
# though still a stat_rq_eqn call for each quantile and manual vertical placement
# https://www.r-bloggers.com/2019/06/curly-curly-the-successor-of-bang-bang/

my_tau = c(0.1, 0.5, 0.9)
my_colours = c("red","purple","darkgreen")
my_linetype = c("dotted", "dashed", "solid")

quantile_plot_specs <- tibble(my_tau, my_colours, my_linetype)

m + 
  geom_quantile(quantiles = {{quantile_plot_specs$my_tau}}, 
                aes(colour = as.factor(..quantile..),
                    linetype = as.factor(..quantile..))
  )+
  scale_color_manual(values = {{quantile_plot_specs$my_colours}})+
  scale_linetype_manual(values = {{quantile_plot_specs$my_linetype}})+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[1]}}, colour = {{quantile_plot_specs$my_colours[1]}}, label.y = 0.9)+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[2]}}, colour = {{quantile_plot_specs$my_colours[2]}}, label.y = 0.95)+
  stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[3]}}, colour = {{quantile_plot_specs$my_colours[3]}}, label.y = 1.0)+
  theme(legend.position = "none")

Package 'ggpmisc' (>= 0.4.5) 允许更简单的答案,这更接近@MarkNeal 在他关于中值回归的问题中所希望的解决方案。当使用最新版本的 'ggpmisc' 时,这个答案应该优于早期的答案。未显示:将 se = FALSE 传递给 stat_quant_line() 会禁用置信带。

library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate

m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

m + 
  stat_quant_line(quantiles = 0.5) +
  stat_quant_eq(aes(label =  paste(after_stat(eq.label), "*\" with \"*", 
                                   after_stat(rho.label), "*\", \"*", 
                                   after_stat(n.label), "*\".\"",
                                   sep = "")),
                quantiles = 0.5,
                size = 3)  
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

reprex package (v2.0.1)

创建于 2022-06-03

默认绘制中位数和四分位数。

m + 
  stat_quant_line() +
  stat_quant_eq(aes(label =  paste(after_stat(eq.label), "*\" with \"*", 
                                   after_stat(rho.label), "*\", \"*", 
                                   after_stat(n.label), "*\".\"",
                                   sep = "")),
                size = 3)  
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

reprex package (v2.0.1)

创建于 2022-06-03

我们还可以轻松地将分位数映射到 colorlinetype 美学。

m + 
  stat_quant_line(aes(linetype = after_stat(quantile.f),
                  color = after_stat(quantile.f))) +
  stat_quant_eq(aes(label =  paste(after_stat(eq.label), "*\" with \"*", 
                                   after_stat(rho.label), "*\", \"*", 
                                   after_stat(n.label), "*\".\"",
                                   sep = ""),
                    color = after_stat(quantile.f)),
                size = 3)  
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

reprex package (v2.0.1)

创建于 2022-06-03

我们还可以使用 stat_quant_band() 而不是 stat_quant_line() 将四分位数绘制为带。

m + 
  stat_quant_band() +
  stat_quant_eq(aes(label =  paste(after_stat(eq.label), "*\" with \"*", 
                                   after_stat(rho.label), "*\", \"*", 
                                   after_stat(n.label), "*\".\"",
                                   sep = "")),
                size = 3)  
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

reprex package (v2.0.1)

创建于 2022-06-03