使用 sapply 和 split 在 facet wrap ggplot 上有不同的 pvalues 和 r^2 时出错

Error using sapply and split to have different pvalues and r^2's on a facet wrap ggplot

我正在尝试让不同的 pvalues 和 r^2 出现在我使用 ggplot 制作的绘图上。我的情节需要多面化,因为我正在处理许多不同的数据因素。我试图制作的图表应该都是线性模型,但我希望每个图表都有自己的 pvalue 和 r^2 显示在各自的 space 中。

我一直在尝试使用 sapply 来拆分数据帧,然后计算 r^2 和 pvalues,然后使用 geom_text(label = examplefunction) 将它们重新插入到绘图中,但我一直收到错误 "Error: Aesthetics must be either length 1 or the same as the data (244): x, y, label, hjust, vjust".

这是一个使用重塑包中的 "tips" 数据框的示例:

library(reshape)

lm_equation <- function(tips){
  sapply(split(tips, list(tips$sex, tips$day)), function(tips){
    m <- lm(tips$tip ~ tips$total_bill, tips);
    eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue, 
                     list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
                          pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
    as.character(as.expression(eq));
  })
}

scat <- ggplot(tips, aes(tip, total_bill))
scat +
  geom_point(size = 5, alpha = 0.9)+
  labs(x = "tip", y = "bill total")+
  geom_smooth(method=lm, colour = "#000000", se = F)+
  facet_grid(sex~day, scales = "free")+
  geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation(tips), parse = T, vjust = "inward", hjust = "inward")+
  theme_classic() + 
  theme(text = element_text(size = 15))

令人沮丧的是,如果我取出拆分,代码仍然有效,但是 pvalues 和 r^2s 是没有意义的,因为它们是从整个数据帧中获取的,而不仅仅是那个特定的分面组。

工作代码示例:

lm_equation2 <- function(tips){
    m <- lm(tips$tip ~ tips$total_bill, tips);
    eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue, 
                     list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
                          pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
    as.character(as.expression(eq));
  }

scat2 <- ggplot(tips, aes(tip, total_bill))
scat2 +
  geom_point(size = 5, alpha = 0.9)+
  labs(x = "tip", y = "bill total")+
  geom_smooth(method=lm, colour = "#000000", se = F)+
  facet_grid(sex~day, scales = "free")+
  geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation2(tips), parse = T, vjust = "inward", hjust = "inward")+
  theme_classic() + 
  theme(text = element_text(size = 15))

我在这里错过了什么?我是否需要求助于对数据进行子集化?

这是一个示例,它利用您已有的内容并将结果组织到一个 data.frame 中,其中包含绘图所需的所有变量。特别是分面变量必须存在于数据集中。

首先,您可以将每个组的标签和名称(sexday 的组合)放入 data.frame 作为列。您需要使用原始 xy 变量的名称为每个方程的位置添加一列。

lab_dat = data.frame(group = names(lm_equation(tips)),
                     tip = min(tips$tip),
                     total_bill = max(tips$total_bill - 10),
                     label = lm_equation(tips))
lab_dat

                  group tip total_bill                                                            label
Female.Fri   Female.Fri   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri       Male.Fri   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat   Female.Sat   1      40.81  ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat       Male.Sat   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun   Female.Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun       Male.Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female.Thur   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur     Male.Thur   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"

然后您需要将组合了 sexdaygroup 变量拆分回两个单独的变量。为此,我使用包 tidyr 中的 separate()。新变量的命名应与原始数据集中的变量相同,因为这些是分面变量,需要存在于用于任何绘图层的数据集中。

library(tidyr)
lab_dat = separate(lab_dat, group, c("sex", "day"))
lab_dat

               sex  day tip total_bill                                                            label
Female.Fri  Female  Fri   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri      Male  Fri   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat  Female  Sat   1      40.81  ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat      Male  Sat   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun  Female  Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun      Male  Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female Thur   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur     Male Thur   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"

现在您可以为每个面绘制一个标签,对 geom_text() 层使用 lab_dat

ggplot(tips, aes(tip, total_bill)) +
     geom_point(size = 5, alpha = 0.9)+
     geom_smooth(method=lm, colour = "#000000", se = FALSE)+
     facet_grid(sex ~ day, scales = "free")+
     geom_text(data = lab_dat, aes(label = label), parse = TRUE, 
               vjust = "inward", hjust = "inward")