如何使用 ggpubr 和 ggsignif 的组合使用 p 值注释箱线图?

How to Annotate a boxplot with p values using a combination of ggpubr and ggsignif?

我正在尝试创建一个显示多重比较的箱线图。我制作了一个玩具数据集,它似乎给我带来了与我在更大的数据集中遇到的错误相同的错误。

library(tidyverse)
library(ggsignif)
library(ggpubr)


dat <- data.frame(measurement = c("750","850","900", "300","200","400", "20", "30", "50"),
                   diagnosis = c("Healthy", "Healthy", "Healthy","Moderate","Moderate","Moderate",  "Sick", "Sick", "Sick"))

dat$measurement <- as.numeric(dat$measurement)

#List of comparisons
dat.compare <- list(c("Healthy", "Moderate"), 
                    c("Healthy", "Sick"), 
                    c("Moderate", "Sick"))

#Running Anova
dat.lm <- lm(measurement ~ diagnosis, data = dat)
TukeyHSD(aov(dat.lm))
Yields: 
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = dat.lm)

$diagnosis
                       diff       lwr        upr     p adj
Moderate-Healthy -4.3333333 -8.830369  0.1637022 0.0574078
Sick-Healthy     -4.6666667 -9.163702 -0.1696312 0.0433911
Sick-Moderate    -0.3333333 -4.830369  4.1637022 0.9720206

dat.p <- list("0.05","0.04", "0.97")

p adj 是我试图用以下代码注释到我的箱线图的内容:

ggboxplot(dat, x ="diagnosis" , y = "measurement" ,
               color = "diagnosis", palette = "jco",
               add = "jitter") +
  ggsignif::geom_signif(data=dat, 
                        comparisons = dat.compare, annotations=dat.p, 
                        map_signif_level = TRUE)

当运行箱线图的代码时,它给我以下错误:

Warning message:
Computation failed in `stat_signif()`:
names do not match previous names 

最终结果应该是这样的

据我所知,比较列表中的名称与数据框中的名称相匹配。我已经坚持了几个小时,知道我做错了什么吗?谢谢!

一个可能的解决方案是使用 geom_signif 在您的箱线图上手动添加您的值。

但是您首先需要生成一个数据框,其中将包含 p 的值、要比较的 x 值和用于设置 p 值的 y 位置。

这里是一个示例,说明如何从 Tukey 测试开始:

Tukey_data <- TukeyHSD(aov(dat.lm))$diagnosis

library(dplyr)
maxvalues <- dat %>% group_by(diagnosis) %>% summarise(MAX = max(measurement))

pval <- as.data.frame(Tukey_data) %>% rownames_to_column("Group") %>%
  rowwise() %>%
  mutate(Start = unlist(strsplit(Group,"-"))[1],
         End = unlist(strsplit(Group,"-"))[2]) %>%
  left_join(.,maxvalues, by = c("Start" = "diagnosis")) %>%
  left_join(.,maxvalues, by = c("End" = "diagnosis")) %>% ungroup() %>%
  mutate(ypos = c(12,10,8))
  mutate(End = factor(End, levels = c("Healthy","Moderate","Sick"))) %>% rowwise() %>%
  mutate(ypos = max(MAX.x, MAX.y)*(1+0.25*as.numeric(End)))

# A tibble: 3 x 10
  Group              diff   lwr    upr `p adj` Start    End      MAX.x MAX.y  ypos
  <chr>             <dbl> <dbl>  <dbl>   <dbl> <chr>    <chr>    <dbl> <dbl> <dbl>
1 Moderate-Healthy -4.33  -8.83  0.164  0.0574 Moderate Healthy      5     9    12
2 Sick-Healthy     -4.67  -9.16 -0.170  0.0434 Sick     Healthy      6     9    10
3 Sick-Moderate    -0.333 -4.83  4.16   0.972  Sick     Moderate     6     5     8

然后,您可以按如下方式传递geom_signif

library(ggpubr)
library(ggsignif)

ggboxplot(dat, x ="diagnosis" , y = "measurement" ,
          color = "diagnosis", palette = "jco",
          add = "jitter") +
  geom_signif(data = pval, manual = TRUE,
              aes(xmax = End, xmin = Start, y_position= ypos, annotations = round(`p adj`,3)))

它能回答您的问题吗?

我发现,如果我将两种列表格式都更改为向量,程序包将毫无错误地接受它。

感谢 @dc37 的帮助。