如何将比较条添加到图中以表示 p 值对应于哪个比较

How to add comparison bars to a plot to denote which comparison a p value corresponds to

我正在使用以下数据框:

df1 <- structure(list(Genotype = structure(c(1L, 1L, 1L, 1L, 1L,
2L,2L,2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L,1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
.Label= c("miR-15/16 FL", "miR-15/16 cKO"), class = "factor"), 
Tissue = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L), .Label = c("iLN", "Spleen", "Skin", "Colon"), class = "factor"), 
`Cells/SC/Live/CD8—,, CD4+/Foxp3+,Median,<BV421-A>,CD127` = c(518L, 
715L, 572L, 599L, 614L, 881L, 743L, 722L, 779L, 843L, 494L, 
610L, 613L, 624L, 631L, 925L, 880L, 932L, 876L, 926L, 1786L, 
2079L, 2199L, 2345L, 2360L, 2408L, 2509L, 3129L, 3263L, 3714L, 
917L, NA, 1066L, 1059L, 939L, 1269L, 1047L, 974L, 1048L, 
1084L)),
.Names = c("Genotype", "Tissue", "Cells/SC/Live/CD8—,,CD4+/Foxp3+,Median,<BV421-A>,CD127"),
row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))

并尝试使用 ggplot2 绘制图表,其中箱形图和点按 "Tissue" 分组并按 "Genotype" 交错显示。显着性值显示正确,但我想添加线条来表示正在进行的比较,并让它们从每个 "miR-15/16 FL" 箱线图的中心开始,并在每个 "miR-15/16 cKO" 箱线图的中心结束,并且位于重要性值的正下方。下面是我用来生成绘图的代码:

library(ggplot2)
library(ggpubr)
color.groups <- c("black","red")
names(color.groups) <- unique(df1$Genotype)
shape.groups <- c(16, 1)
names(shape.groups) <- unique(df1$Genotype)

ggplot(df1, aes(x = Tissue, y = df1[3], color = Genotype, shape = Genotype)) +
  geom_boxplot(position = position_dodge(), outlier.shape = NA) +
  geom_point(position=position_dodge(width=0.75)) +
  ylim(0,1.2*max(df1[3], na.rm = TRUE)) +
  ylab('MFI CD127 (of CD4+ Foxp3+ T cells') +
  scale_color_manual(values=color.groups) +
  scale_shape_manual(values=shape.groups) +
  theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
                     panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"),
                     axis.title.x=element_blank(), aspect.ratio = 1,
                     text = element_text(size = 9)) +
  stat_compare_means(show.legend = FALSE, label = 'p.format', method = 't.test',
                     label.y = c(0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(1:10),], na.rm = TRUE),
                                 0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(11:20),], na.rm = TRUE),
                                 0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(21:30),], na.rm = TRUE),
                                 0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(31:40),], na.rm = TRUE)))

感谢您的帮助!

我通过对 geom_segment 的三个调用创建了括号。这些调用使用新创建的 dmax 数据框来提供用于定位括号和 p 值标签的参考 y 值。值 er 用于调整这些位置。

我对您的代码进行了一些其他更改。

  1. 将第三列的名称更改为temp,并在对ggplot 的调用中使用此名称y=temp。您的原始代码使用 y=df1[3],它实质上到达了绘图环境之外的父环境中的 df1 对象,这可能会导致问题。此外,使用一个短名称可以更轻松地生成 dmax 数据框并引用其列。

  2. stat_compare_means中的label.y个位置使用dmax数据框,这减少了所需的代码量。 (顺便说一句,stat_compare_means 似乎需要硬编码 label.y 位置,而不是从数据的 aes 映射中获取它们。)

  3. 将 p 值标签放置在每对箱形图上方的绝对距离(使用值 e),而不是乘法距离。这样可以更轻松地保持 p 值标签、括号和箱线图之间的间距一致。


# Use a short column name for the third column
names(df1)[3] = "temp"

# Generate data frame of reference y-values for p-value labels and bracket positions
dmax = df1 %>% group_by(Tissue) %>% 
  summarise(temp=max(temp, na.rm=TRUE),
            Genotype=NA)

# For tweaking position of brackets
e = 350
r = 0.6
w = 0.19
bcol = "grey30"

ggplot(df1, aes(x = Tissue, y = temp, color = Genotype, shape = Genotype)) +
  geom_boxplot(position = position_dodge(), outlier.shape = NA) +
  geom_point(position=position_dodge(width=0.75)) +
  ylim(0,1.2*max(df1[3], na.rm = TRUE)) +
  ylab('MFI CD127 (of CD4+ Foxp3+ T cells') +
  scale_color_manual(values=color.groups) +
  scale_shape_manual(values=shape.groups) +
  theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
                     panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"),
                     axis.title.x=element_blank(), aspect.ratio = 1,
                     text = element_text(size = 9)) +
  stat_compare_means(show.legend = FALSE, label = 'p.format', method = 't.test',
                     label.y = e + dmax$temp) +
  geom_segment(data=dmax,
               aes(x=as.numeric(Tissue)-w, xend=as.numeric(Tissue)+w, 
                   y=temp + r*e, yend=temp + r*e), size=0.3, color=bcol, inherit.aes=FALSE) +
  geom_segment(data=dmax,
               aes(x=as.numeric(Tissue) + w, xend=as.numeric(Tissue) + w, 
                   y=temp + r*e, yend=temp + r*e - 60), size=0.3, color=bcol, inherit.aes=FALSE) +
  geom_segment(data=dmax,
               aes(x=as.numeric(Tissue) - w, xend=as.numeric(Tissue) - w, 
                   y=temp + r*e, yend=temp + r*e - 60), size=0.3, color=bcol, inherit.aes=FALSE)

为了解决您的评论,这里有一个示例来说明上述方法固有地适应任意数量的 x 类别。

让我们从添加两个新的组织类别开始:

library(forcats)

df1$Tissue = fct_expand(df1$Tissue, "Tissue 5", "Tissue 6")
df1$Tissue[seq(1,20,4)] = "Tissue 5"
df1$Tissue[seq(21,40,4)] = "Tissue 6"

dmax = df1 %>% group_by(Tissue) %>% 
  summarise(temp=max(temp, na.rm=TRUE),
            Genotype=NA)

现在运行完全按照上面列出的剧情代码得到如下剧情: