如何使用 ggpubr 和 ggsignif 的组合使用 p 值注释箱线图?
How to Annotate a boxplot with p values using a combination of ggpubr and ggsignif?
我正在尝试创建一个显示多重比较的箱线图。我制作了一个玩具数据集,它似乎给我带来了与我在更大的数据集中遇到的错误相同的错误。
library(tidyverse)
library(ggsignif)
library(ggpubr)
dat <- data.frame(measurement = c("750","850","900", "300","200","400", "20", "30", "50"),
diagnosis = c("Healthy", "Healthy", "Healthy","Moderate","Moderate","Moderate", "Sick", "Sick", "Sick"))
dat$measurement <- as.numeric(dat$measurement)
#List of comparisons
dat.compare <- list(c("Healthy", "Moderate"),
c("Healthy", "Sick"),
c("Moderate", "Sick"))
#Running Anova
dat.lm <- lm(measurement ~ diagnosis, data = dat)
TukeyHSD(aov(dat.lm))
Yields:
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = dat.lm)
$diagnosis
diff lwr upr p adj
Moderate-Healthy -4.3333333 -8.830369 0.1637022 0.0574078
Sick-Healthy -4.6666667 -9.163702 -0.1696312 0.0433911
Sick-Moderate -0.3333333 -4.830369 4.1637022 0.9720206
dat.p <- list("0.05","0.04", "0.97")
p adj 是我试图用以下代码注释到我的箱线图的内容:
ggboxplot(dat, x ="diagnosis" , y = "measurement" ,
color = "diagnosis", palette = "jco",
add = "jitter") +
ggsignif::geom_signif(data=dat,
comparisons = dat.compare, annotations=dat.p,
map_signif_level = TRUE)
当运行箱线图的代码时,它给我以下错误:
Warning message:
Computation failed in `stat_signif()`:
names do not match previous names
最终结果应该是这样的
据我所知,比较列表中的名称与数据框中的名称相匹配。我已经坚持了几个小时,知道我做错了什么吗?谢谢!
一个可能的解决方案是使用 geom_signif
在您的箱线图上手动添加您的值。
但是您首先需要生成一个数据框,其中将包含 p 的值、要比较的 x 值和用于设置 p 值的 y 位置。
这里是一个示例,说明如何从 Tukey 测试开始:
Tukey_data <- TukeyHSD(aov(dat.lm))$diagnosis
library(dplyr)
maxvalues <- dat %>% group_by(diagnosis) %>% summarise(MAX = max(measurement))
pval <- as.data.frame(Tukey_data) %>% rownames_to_column("Group") %>%
rowwise() %>%
mutate(Start = unlist(strsplit(Group,"-"))[1],
End = unlist(strsplit(Group,"-"))[2]) %>%
left_join(.,maxvalues, by = c("Start" = "diagnosis")) %>%
left_join(.,maxvalues, by = c("End" = "diagnosis")) %>% ungroup() %>%
mutate(ypos = c(12,10,8))
mutate(End = factor(End, levels = c("Healthy","Moderate","Sick"))) %>% rowwise() %>%
mutate(ypos = max(MAX.x, MAX.y)*(1+0.25*as.numeric(End)))
# A tibble: 3 x 10
Group diff lwr upr `p adj` Start End MAX.x MAX.y ypos
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 Moderate-Healthy -4.33 -8.83 0.164 0.0574 Moderate Healthy 5 9 12
2 Sick-Healthy -4.67 -9.16 -0.170 0.0434 Sick Healthy 6 9 10
3 Sick-Moderate -0.333 -4.83 4.16 0.972 Sick Moderate 6 5 8
然后,您可以按如下方式传递geom_signif
:
library(ggpubr)
library(ggsignif)
ggboxplot(dat, x ="diagnosis" , y = "measurement" ,
color = "diagnosis", palette = "jco",
add = "jitter") +
geom_signif(data = pval, manual = TRUE,
aes(xmax = End, xmin = Start, y_position= ypos, annotations = round(`p adj`,3)))
它能回答您的问题吗?
我发现,如果我将两种列表格式都更改为向量,程序包将毫无错误地接受它。
感谢 @dc37 的帮助。
我正在尝试创建一个显示多重比较的箱线图。我制作了一个玩具数据集,它似乎给我带来了与我在更大的数据集中遇到的错误相同的错误。
library(tidyverse)
library(ggsignif)
library(ggpubr)
dat <- data.frame(measurement = c("750","850","900", "300","200","400", "20", "30", "50"),
diagnosis = c("Healthy", "Healthy", "Healthy","Moderate","Moderate","Moderate", "Sick", "Sick", "Sick"))
dat$measurement <- as.numeric(dat$measurement)
#List of comparisons
dat.compare <- list(c("Healthy", "Moderate"),
c("Healthy", "Sick"),
c("Moderate", "Sick"))
#Running Anova
dat.lm <- lm(measurement ~ diagnosis, data = dat)
TukeyHSD(aov(dat.lm))
Yields:
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = dat.lm)
$diagnosis
diff lwr upr p adj
Moderate-Healthy -4.3333333 -8.830369 0.1637022 0.0574078
Sick-Healthy -4.6666667 -9.163702 -0.1696312 0.0433911
Sick-Moderate -0.3333333 -4.830369 4.1637022 0.9720206
dat.p <- list("0.05","0.04", "0.97")
p adj 是我试图用以下代码注释到我的箱线图的内容:
ggboxplot(dat, x ="diagnosis" , y = "measurement" ,
color = "diagnosis", palette = "jco",
add = "jitter") +
ggsignif::geom_signif(data=dat,
comparisons = dat.compare, annotations=dat.p,
map_signif_level = TRUE)
当运行箱线图的代码时,它给我以下错误:
Warning message:
Computation failed in `stat_signif()`:
names do not match previous names
最终结果应该是这样的
据我所知,比较列表中的名称与数据框中的名称相匹配。我已经坚持了几个小时,知道我做错了什么吗?谢谢!
一个可能的解决方案是使用 geom_signif
在您的箱线图上手动添加您的值。
但是您首先需要生成一个数据框,其中将包含 p 的值、要比较的 x 值和用于设置 p 值的 y 位置。
这里是一个示例,说明如何从 Tukey 测试开始:
Tukey_data <- TukeyHSD(aov(dat.lm))$diagnosis
library(dplyr)
maxvalues <- dat %>% group_by(diagnosis) %>% summarise(MAX = max(measurement))
pval <- as.data.frame(Tukey_data) %>% rownames_to_column("Group") %>%
rowwise() %>%
mutate(Start = unlist(strsplit(Group,"-"))[1],
End = unlist(strsplit(Group,"-"))[2]) %>%
left_join(.,maxvalues, by = c("Start" = "diagnosis")) %>%
left_join(.,maxvalues, by = c("End" = "diagnosis")) %>% ungroup() %>%
mutate(ypos = c(12,10,8))
mutate(End = factor(End, levels = c("Healthy","Moderate","Sick"))) %>% rowwise() %>%
mutate(ypos = max(MAX.x, MAX.y)*(1+0.25*as.numeric(End)))
# A tibble: 3 x 10
Group diff lwr upr `p adj` Start End MAX.x MAX.y ypos
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 Moderate-Healthy -4.33 -8.83 0.164 0.0574 Moderate Healthy 5 9 12
2 Sick-Healthy -4.67 -9.16 -0.170 0.0434 Sick Healthy 6 9 10
3 Sick-Moderate -0.333 -4.83 4.16 0.972 Sick Moderate 6 5 8
然后,您可以按如下方式传递geom_signif
:
library(ggpubr)
library(ggsignif)
ggboxplot(dat, x ="diagnosis" , y = "measurement" ,
color = "diagnosis", palette = "jco",
add = "jitter") +
geom_signif(data = pval, manual = TRUE,
aes(xmax = End, xmin = Start, y_position= ypos, annotations = round(`p adj`,3)))
它能回答您的问题吗?
我发现,如果我将两种列表格式都更改为向量,程序包将毫无错误地接受它。
感谢 @dc37 的帮助。