使用 sapply 和 split 在 facet wrap ggplot 上有不同的 pvalues 和 r^2 时出错
Error using sapply and split to have different pvalues and r^2's on a facet wrap ggplot
我正在尝试让不同的 pvalues 和 r^2 出现在我使用 ggplot 制作的绘图上。我的情节需要多面化,因为我正在处理许多不同的数据因素。我试图制作的图表应该都是线性模型,但我希望每个图表都有自己的 pvalue 和 r^2 显示在各自的 space 中。
我一直在尝试使用 sapply 来拆分数据帧,然后计算 r^2 和 pvalues,然后使用 geom_text(label = examplefunction) 将它们重新插入到绘图中,但我一直收到错误 "Error: Aesthetics must be either length 1 or the same as the data (244): x, y, label, hjust, vjust".
这是一个使用重塑包中的 "tips" 数据框的示例:
library(reshape)
lm_equation <- function(tips){
sapply(split(tips, list(tips$sex, tips$day)), function(tips){
m <- lm(tips$tip ~ tips$total_bill, tips);
eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
as.character(as.expression(eq));
})
}
scat <- ggplot(tips, aes(tip, total_bill))
scat +
geom_point(size = 5, alpha = 0.9)+
labs(x = "tip", y = "bill total")+
geom_smooth(method=lm, colour = "#000000", se = F)+
facet_grid(sex~day, scales = "free")+
geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation(tips), parse = T, vjust = "inward", hjust = "inward")+
theme_classic() +
theme(text = element_text(size = 15))
令人沮丧的是,如果我取出拆分,代码仍然有效,但是 pvalues 和 r^2s 是没有意义的,因为它们是从整个数据帧中获取的,而不仅仅是那个特定的分面组。
工作代码示例:
lm_equation2 <- function(tips){
m <- lm(tips$tip ~ tips$total_bill, tips);
eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
as.character(as.expression(eq));
}
scat2 <- ggplot(tips, aes(tip, total_bill))
scat2 +
geom_point(size = 5, alpha = 0.9)+
labs(x = "tip", y = "bill total")+
geom_smooth(method=lm, colour = "#000000", se = F)+
facet_grid(sex~day, scales = "free")+
geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation2(tips), parse = T, vjust = "inward", hjust = "inward")+
theme_classic() +
theme(text = element_text(size = 15))
我在这里错过了什么?我是否需要求助于对数据进行子集化?
这是一个示例,它利用您已有的内容并将结果组织到一个 data.frame 中,其中包含绘图所需的所有变量。特别是分面变量必须存在于数据集中。
首先,您可以将每个组的标签和名称(sex
和 day
的组合)放入 data.frame 作为列。您需要使用原始 x
和 y
变量的名称为每个方程的位置添加一列。
lab_dat = data.frame(group = names(lm_equation(tips)),
tip = min(tips$tip),
total_bill = max(tips$total_bill - 10),
label = lm_equation(tips))
lab_dat
group tip total_bill label
Female.Fri Female.Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri Male.Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat Female.Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat Male.Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun Female.Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun Male.Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female.Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur Male.Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
然后您需要将组合了 sex
和 day
的 group
变量拆分回两个单独的变量。为此,我使用包 tidyr 中的 separate()
。新变量的命名应与原始数据集中的变量相同,因为这些是分面变量,需要存在于用于任何绘图层的数据集中。
library(tidyr)
lab_dat = separate(lab_dat, group, c("sex", "day"))
lab_dat
sex day tip total_bill label
Female.Fri Female Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri Male Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat Female Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat Male Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun Female Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun Male Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur Male Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
现在您可以为每个面绘制一个标签,对 geom_text()
层使用 lab_dat
。
ggplot(tips, aes(tip, total_bill)) +
geom_point(size = 5, alpha = 0.9)+
geom_smooth(method=lm, colour = "#000000", se = FALSE)+
facet_grid(sex ~ day, scales = "free")+
geom_text(data = lab_dat, aes(label = label), parse = TRUE,
vjust = "inward", hjust = "inward")
我正在尝试让不同的 pvalues 和 r^2 出现在我使用 ggplot 制作的绘图上。我的情节需要多面化,因为我正在处理许多不同的数据因素。我试图制作的图表应该都是线性模型,但我希望每个图表都有自己的 pvalue 和 r^2 显示在各自的 space 中。
我一直在尝试使用 sapply 来拆分数据帧,然后计算 r^2 和 pvalues,然后使用 geom_text(label = examplefunction) 将它们重新插入到绘图中,但我一直收到错误 "Error: Aesthetics must be either length 1 or the same as the data (244): x, y, label, hjust, vjust".
这是一个使用重塑包中的 "tips" 数据框的示例:
library(reshape)
lm_equation <- function(tips){
sapply(split(tips, list(tips$sex, tips$day)), function(tips){
m <- lm(tips$tip ~ tips$total_bill, tips);
eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
as.character(as.expression(eq));
})
}
scat <- ggplot(tips, aes(tip, total_bill))
scat +
geom_point(size = 5, alpha = 0.9)+
labs(x = "tip", y = "bill total")+
geom_smooth(method=lm, colour = "#000000", se = F)+
facet_grid(sex~day, scales = "free")+
geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation(tips), parse = T, vjust = "inward", hjust = "inward")+
theme_classic() +
theme(text = element_text(size = 15))
令人沮丧的是,如果我取出拆分,代码仍然有效,但是 pvalues 和 r^2s 是没有意义的,因为它们是从整个数据帧中获取的,而不仅仅是那个特定的分面组。
工作代码示例:
lm_equation2 <- function(tips){
m <- lm(tips$tip ~ tips$total_bill, tips);
eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
as.character(as.expression(eq));
}
scat2 <- ggplot(tips, aes(tip, total_bill))
scat2 +
geom_point(size = 5, alpha = 0.9)+
labs(x = "tip", y = "bill total")+
geom_smooth(method=lm, colour = "#000000", se = F)+
facet_grid(sex~day, scales = "free")+
geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation2(tips), parse = T, vjust = "inward", hjust = "inward")+
theme_classic() +
theme(text = element_text(size = 15))
我在这里错过了什么?我是否需要求助于对数据进行子集化?
这是一个示例,它利用您已有的内容并将结果组织到一个 data.frame 中,其中包含绘图所需的所有变量。特别是分面变量必须存在于数据集中。
首先,您可以将每个组的标签和名称(sex
和 day
的组合)放入 data.frame 作为列。您需要使用原始 x
和 y
变量的名称为每个方程的位置添加一列。
lab_dat = data.frame(group = names(lm_equation(tips)),
tip = min(tips$tip),
total_bill = max(tips$total_bill - 10),
label = lm_equation(tips))
lab_dat
group tip total_bill label
Female.Fri Female.Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri Male.Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat Female.Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat Male.Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun Female.Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun Male.Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female.Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur Male.Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
然后您需要将组合了 sex
和 day
的 group
变量拆分回两个单独的变量。为此,我使用包 tidyr 中的 separate()
。新变量的命名应与原始数据集中的变量相同,因为这些是分面变量,需要存在于用于任何绘图层的数据集中。
library(tidyr)
lab_dat = separate(lab_dat, group, c("sex", "day"))
lab_dat
sex day tip total_bill label
Female.Fri Female Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri Male Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat Female Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat Male Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun Female Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun Male Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur Male Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
现在您可以为每个面绘制一个标签,对 geom_text()
层使用 lab_dat
。
ggplot(tips, aes(tip, total_bill)) +
geom_point(size = 5, alpha = 0.9)+
geom_smooth(method=lm, colour = "#000000", se = FALSE)+
facet_grid(sex ~ day, scales = "free")+
geom_text(data = lab_dat, aes(label = label), parse = TRUE,
vjust = "inward", hjust = "inward")