如何根据数据列值向 x 轴添加附加值?

How to add additional values to x-axis based on data column values?

我正在使用循环绘制多个基因的散点图。为每个基因生成多个 png 文件。每个基因/png 文件包含两个散点图:左侧的 Group1 和右侧的 Group2。每组都包含健康和不健康的样本。到这里我已经成功导出代码了。

但是,我现在需要做的是在每个 'time point' 的 x 轴上为每个健康和不健康的组添加样本数。这是基于 'samples' 列。对于每个时间点,这应该显示为“(健康状况下的样本数,不健康状况下的样本数)”。谁能帮我实现这个目标?

我当前的 2 个基因示例数据框 'data' 如下:

Biomarkers  TimePoint   Group   Scale   Readings    Condition   samples
Gene1   52.5    Group1  25  0.027   Healthy 33
Gene1   52.5    Group2  25  0.024   Healthy 35
Gene1   57.5    Group1  25  0.029   Healthy 39
Gene1   57.5    Group2  25  0.023   Healthy 46
Gene1   62.5    Group1  25  0.030   Healthy 38
Gene1   62.5    Group2  25  0.024   Healthy 42
Gene1   67.5    Group1  25  0.033   Healthy 23
Gene1   67.5    Group2  25  0.026   Healthy 16
Gene2   52.5    Group1  25  0.051   Healthy 33
Gene2   52.5    Group2  25  0.046   Healthy 35
Gene2   57.5    Group1  25  0.052   Healthy 39
Gene2   57.5    Group2  25  0.048   Healthy 46
Gene2   62.5    Group1  25  0.049   Healthy 38
Gene2   62.5    Group2  25  0.051   Healthy 42
Gene2   67.5    Group1  25  0.051   Healthy 23
Gene2   67.5    Group2  25  0.052   Healthy 16
Gene1   52.5    Group1  25.01   0.026   Unhealthy   41
Gene1   52.5    Group2  25.01   0.023   Unhealthy   57
Gene1   57.5    Group1  25.01   0.027   Unhealthy   79
Gene1   57.5    Group2  25.01   0.024   Unhealthy   70
Gene1   62.5    Group1  25.01   0.030   Unhealthy   93
Gene1   62.5    Group2  25.01   0.025   Unhealthy   84
Gene1   67.5    Group1  25.01   0.033   Unhealthy   98
Gene1   67.5    Group2  25.01   0.022   Unhealthy   64
Gene2   52.5    Group1  25.01   0.043   Unhealthy   36
Gene2   52.5    Group2  25.01   0.044   Unhealthy   57
Gene2   57.5    Group1  25.01   0.043   Unhealthy   79
Gene2   57.5    Group2  25.01   0.043   Unhealthy   70
Gene2   62.5    Group1  25.01   0.043   Unhealthy   93
Gene2   62.5    Group2  25.01   0.044   Unhealthy   84
Gene2   67.5    Group1  25.01   0.044   Unhealthy   98
Gene2   67.5    Group2  25.01   0.044   Unhealthy   64
Gene1   52.5    Group1  50  0.035   Healthy 33
Gene1   52.5    Group2  50  0.029   Healthy 35
Gene1   57.5    Group1  50  0.039   Healthy 39
Gene1   57.5    Group2  50  0.031   Healthy 46
Gene1   62.5    Group1  50  0.038   Healthy 38
Gene1   62.5    Group2  50  0.030   Healthy 42
Gene1   67.5    Group1  50  0.040   Healthy 23
Gene1   67.5    Group2  50  0.035   Healthy 16
Gene2   52.5    Group1  50  0.058   Healthy 33
Gene2   52.5    Group2  50  0.053   Healthy 35
Gene2   57.5    Group1  50  0.059   Healthy 39
Gene2   57.5    Group2  50  0.056   Healthy 46
Gene2   62.5    Group1  50  0.057   Healthy 38
Gene2   62.5    Group2  50  0.058   Healthy 42
Gene2   67.5    Group1  50  0.061   Healthy 23
Gene2   67.5    Group2  50  0.058   Healthy 16
Gene1   52.5    Group1  50.01   0.038   Unhealthy   41
Gene1   52.5    Group2  50.01   0.030   Unhealthy   57
Gene1   57.5    Group1  50.01   0.038   Unhealthy   79
Gene1   57.5    Group2  50.01   0.031   Unhealthy   70
Gene1   62.5    Group1  50.01   0.040   Unhealthy   93
Gene1   62.5    Group2  50.01   0.032   Unhealthy   84
Gene1   67.5    Group1  50.01   0.043   Unhealthy   98
Gene1   67.5    Group2  50.01   0.033   Unhealthy   64
Gene2   52.5    Group1  50.01   0.052   Unhealthy   36
Gene2   52.5    Group2  50.01   0.051   Unhealthy   57
Gene2   57.5    Group1  50.01   0.052   Unhealthy   79
Gene2   57.5    Group2  50.01   0.051   Unhealthy   70
Gene2   62.5    Group1  50.01   0.052   Unhealthy   93
Gene2   62.5    Group2  50.01   0.052   Unhealthy   84
Gene2   67.5    Group1  50.01   0.053   Unhealthy   98
Gene2   67.5    Group2  50.01   0.051   Unhealthy   64
Gene1   52.5    Group1  75  0.045   Healthy 33
Gene1   52.5    Group2  75  0.038   Healthy 35
Gene1   57.5    Group1  75  0.048   Healthy 39
Gene1   57.5    Group2  75  0.041   Healthy 46
Gene1   62.5    Group1  75  0.047   Healthy 38
Gene1   62.5    Group2  75  0.040   Healthy 42
Gene1   67.5    Group1  75  0.050   Healthy 23
Gene1   67.5    Group2  75  0.043   Healthy 16
Gene2   52.5    Group1  75  0.066   Healthy 33
Gene2   52.5    Group2  75  0.064   Healthy 35
Gene2   57.5    Group1  75  0.065   Healthy 39
Gene2   57.5    Group2  75  0.064   Healthy 46
Gene2   62.5    Group1  75  0.068   Healthy 38
Gene2   62.5    Group2  75  0.071   Healthy 42
Gene2   67.5    Group1  75  0.070   Healthy 23
Gene2   67.5    Group2  75  0.071   Healthy 16
Gene1   52.5    Group1  75.01   0.057   Unhealthy   41
Gene1   52.5    Group2  75.01   0.041   Unhealthy   57
Gene1   57.5    Group1  75.01   0.056   Unhealthy   79
Gene1   57.5    Group2  75.01   0.040   Unhealthy   70
Gene1   62.5    Group1  75.01   0.057   Unhealthy   93
Gene1   62.5    Group2  75.01   0.043   Unhealthy   84
Gene1   67.5    Group1  75.01   0.059   Unhealthy   98
Gene1   67.5    Group2  75.01   0.046   Unhealthy   64
Gene2   52.5    Group1  75.01   0.063   Unhealthy   36
Gene2   52.5    Group2  75.01   0.060   Unhealthy   57
Gene2   57.5    Group1  75.01   0.061   Unhealthy   79
Gene2   57.5    Group2  75.01   0.062   Unhealthy   70
Gene2   62.5    Group1  75.01   0.062   Unhealthy   93
Gene2   62.5    Group2  75.01   0.062   Unhealthy   84
Gene2   67.5    Group1  75.01   0.061   Unhealthy   98
Gene2   67.5    Group2  75.01   0.060   Unhealthy   64
Gene1   52.5    Group1  100 0.056   Healthy 33
Gene1   52.5    Group2  100 0.046   Healthy 35
Gene1   57.5    Group1  100 0.063   Healthy 39
Gene1   57.5    Group2  100 0.048   Healthy 46
Gene1   62.5    Group1  100 0.060   Healthy 38
Gene1   62.5    Group2  100 0.052   Healthy 42
Gene1   67.5    Group1  100 0.064   Healthy 23
Gene1   67.5    Group2  100 0.055   Healthy 16
Gene2   52.5    Group1  100 0.082   Healthy 33
Gene2   52.5    Group2  100 0.074   Healthy 35
Gene2   57.5    Group1  100 0.070   Healthy 39
Gene2   57.5    Group2  100 0.075   Healthy 46
Gene2   62.5    Group1  100 0.074   Healthy 38
Gene2   62.5    Group2  100 0.078   Healthy 42
Gene2   67.5    Group1  100 0.080   Healthy 23
Gene2   67.5    Group2  100 0.075   Healthy 16
Gene1   52.5    Group1  100.01  0.090   Unhealthy   41
Gene1   52.5    Group2  100.01  0.060   Unhealthy   57
Gene1   57.5    Group1  100.01  0.093   Unhealthy   79
Gene1   57.5    Group2  100.01  0.053   Unhealthy   70
Gene1   62.5    Group1  100.01  0.089   Unhealthy   93
Gene1   62.5    Group2  100.01  0.057   Unhealthy   84
Gene1   67.5    Group1  100.01  0.089   Unhealthy   98
Gene1   67.5    Group2  100.01  0.065   Unhealthy   64
Gene2   52.5    Group1  100.01  0.074   Unhealthy   36
Gene2   52.5    Group2  100.01  0.074   Unhealthy   57
Gene2   57.5    Group1  100.01  0.077   Unhealthy   79
Gene2   57.5    Group2  100.01  0.078   Unhealthy   70
Gene2   62.5    Group1  100.01  0.073   Unhealthy   93
Gene2   62.5    Group2  100.01  0.073   Unhealthy   84
Gene2   67.5    Group1  100.01  0.072   Unhealthy   98
Gene2   67.5    Group2  100.01  0.074   Unhealthy   64

我的数据输入是:

dput(data)
structure(list(Biomarkers = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Gene1", 
"Gene2"), class = "factor"), TimePoint = c(52.5, 52.5, 57.5, 
57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 
67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 
52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 
62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 
67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 
57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 
62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 
52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 
57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 
67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 
52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 
62.5, 62.5, 67.5, 67.5), Group = structure(c(1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Group1", 
"Group2"), class = "factor"), Scale = c(25, 25, 25, 25, 25, 25, 
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25.01, 25.01, 25.01, 
25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 
25.01, 25.01, 25.01, 25.01, 50, 50, 50, 50, 50, 50, 50, 50, 50, 
50, 50, 50, 50, 50, 50, 50, 50.01, 50.01, 50.01, 50.01, 50.01, 
50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 
50.01, 50.01, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 
75, 75, 75, 75, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 
75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 
100, 100, 100, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 
100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 
100.01, 100.01), Readings = c(0.027, 0.024, 0.029, 0.023, 0.03, 
0.024, 0.033, 0.026, 0.051, 0.046, 0.052, 0.048, 0.049, 0.051, 
0.051, 0.052, 0.026, 0.023, 0.027, 0.024, 0.03, 0.025, 0.033, 
0.022, 0.043, 0.044, 0.043, 0.043, 0.043, 0.044, 0.044, 0.044, 
0.035, 0.029, 0.039, 0.031, 0.038, 0.03, 0.04, 0.035, 0.058, 
0.053, 0.059, 0.056, 0.057, 0.058, 0.061, 0.058, 0.038, 0.03, 
0.038, 0.031, 0.04, 0.032, 0.043, 0.033, 0.052, 0.051, 0.052, 
0.051, 0.052, 0.052, 0.053, 0.051, 0.045, 0.038, 0.048, 0.041, 
0.047, 0.04, 0.05, 0.043, 0.066, 0.064, 0.065, 0.064, 0.068, 
0.071, 0.07, 0.071, 0.057, 0.041, 0.056, 0.04, 0.057, 0.043, 
0.059, 0.046, 0.063, 0.06, 0.061, 0.062, 0.062, 0.062, 0.061, 
0.06, 0.056, 0.046, 0.063, 0.048, 0.06, 0.052, 0.064, 0.055, 
0.082, 0.074, 0.07, 0.075, 0.074, 0.078, 0.08, 0.075, 0.09, 0.06, 
0.093, 0.053, 0.089, 0.057, 0.089, 0.065, 0.074, 0.074, 0.077, 
0.078, 0.073, 0.073, 0.072, 0.074), Condition = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Healthy", 
"Unhealthy"), class = "factor"), samples = c(33L, 35L, 39L, 46L, 
38L, 42L, 23L, 16L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 41L, 
57L, 79L, 70L, 93L, 84L, 98L, 64L, 36L, 57L, 79L, 70L, 93L, 84L, 
98L, 64L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 33L, 35L, 39L, 
46L, 38L, 42L, 23L, 16L, 41L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 
36L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 33L, 35L, 39L, 46L, 38L, 
42L, 23L, 16L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 41L, 57L, 
79L, 70L, 93L, 84L, 98L, 64L, 36L, 57L, 79L, 70L, 93L, 84L, 98L, 
64L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 33L, 35L, 39L, 46L, 
38L, 42L, 23L, 16L, 41L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 36L, 
57L, 79L, 70L, 93L, 84L, 98L, 64L)), class = "data.frame", row.names = c(NA, 
-128L))

我现在的代码是这样的:

# Load libraries
library(ggplot2)
library(magrittr)
library(dplyr)
library(gridExtra)
library(grid)

proc_plot <- function(sub) {

  data_Group1 <- sub[sub$Group == "Group1", ]
  data_Group2 <- sub[sub$Group == "Group2", ]

  min_rdg <- min(data_Group1$Readings, data_Group2$Readings)
  max_rdg <- max(data_Group1$Readings, data_Group2$Readings)

  # Group1
  graph_Group1 <- ggplot(data_Group1, aes(x = TimePoint, y = Readings, group = Scale)) +
    labs(title="Group1", x="Time point", y="Readings") +
    scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
                       labels = c("1", "2", "3", "4")) +
    geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
    geom_point(aes(color = Scale),size = 2.5, na.rm = TRUE) +
    scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
    scale_y_continuous(limits = c(min_rdg, max_rdg)) +
    theme(legend.key.height = unit(2.3, "cm"))

  # Group2
  graph_Group2 <- ggplot(data_Group2, aes(x = TimePoint, y = Readings, group = Scale)) +
    labs(title="Group2", x="Time point", y="Readings") +
    scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
                       labels = c("1", "2", "3", "4")) +
    geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
    geom_point(aes(color = Scale), size = 2.5, na.rm = TRUE) +
    scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
    scale_y_continuous(limits = c(min_rdg, max_rdg)) +
    theme(legend.key.height = unit(2.3, "cm"))

  png (paste0("ScatterPlot_", sub$Biomarkers[[1]], ".png"), height=600, width=1111)
    output <- grid.arrange(graph_Group1, graph_Group2, nrow = 1, 
                           top=textGrob(sub$Biomarkers[[1]], gp=gpar(fontsize=20)))
  dev.off()

  return(output)
}


# BUILD PLOT LIST AND PNG FILES
plot_list <- by(data, data$Biomarkers, proc_plot)

dev.off()
grid.draw(plot_list$Gene1)

dev.off()
grid.draw(plot_list$Gene2)

我还在下面附上了 Gene1 的示例 png 文件。我已经手动添加了红色数字以突出显示并表明它正是我需要为每个 gene/png 文件(但为黑色)所需要的。

感谢任何帮助。谢谢。

您可以使用 \n 在标签中换行。例如,

scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
                   labels = c("1\n(33, 41)", "2\n(39, 79)", "3\n(38, 93)", "4\n(23, 98)"))

您可以像这样以编程方式执行此操作:

lab_df = data_Group1 %>% group_by(TimePoint) %>% 
  summarize(label = sprintf("(%s, %s)", first(samples[Condition == "Healthy"]), first(samples[Condition == "Unhealthy"])))
lab_df                                                  
# # A tibble: 4 x 2
#   TimePoint label   
#       <dbl> <chr>   
# 1      52.5 (33, 41)
# 2      57.5 (39, 79)
# 3      62.5 (38, 93)
# 4      67.5 (23, 98)

ggplot(...) + ... +
  scale_x_continuous(
    breaks = lab_df$TimePoint,
    labels = paste(1:nrow(lab_df), lab_df$label, sep = "\n")
  )

完整的服务解决方案。简化为使用 for 循环而不是单独处理组,标签以编程方式处理。

proc_plot <- function(sub) {
  lab_df = sub %>% group_by(TimePoint, Group) %>% 
    summarize(label = sprintf(
      "(%s, %s)", 
      first(samples[Condition == "Healthy"]),
      first(samples[Condition == "Unhealthy"])
    )) %>%
    arrange(Group, TimePoint) # make sure things are in order

  min_rdg <- min(sub$Readings)
  max_rdg <- max(sub$Readings)

  graphs = list()

  for (i in unique(sub$Group)) {
    this_lab = lab_df[lab_df$Group == i, ]
    graphs[[i]] =  ggplot(sub[sub$Group == i, ], aes(x = TimePoint, y = Readings, group = Scale)) +
      labs(title = i, x = "Time point", y = "Readings") +
      scale_x_continuous(breaks = this_lab$TimePoint,
                         labels = paste(1:nrow(this_lab), this_lab$label, sep = "\n")) +
      geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
      geom_point(aes(color = Scale),size = 2.5, na.rm = TRUE) +
      scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
      scale_y_continuous(limits = c(min_rdg, max_rdg)) +
      theme(legend.key.height = unit(2.3, "cm"))
  }

  png (paste0("ScatterPlot_", sub$Biomarkers[[1]], ".png"), height=600, width=1111)
    output <- grid.arrange(grobs = graphs, nrow = 1, 
                           top = textGrob(sub$Biomarkers[[1]], gp = gpar(fontsize = 20)))
  dev.off()
  return(output)
}

proc_plot(sub[sub$Biomarkers == "Gene1", ])