在 R 中绘制具有相同名称的列

Question

我有这样的数据：

我想绘制具有相同名称的列，例如：肠杆菌科治疗 1 在一起。

所以它看起来像这样： x 行 - 将包含治疗：治疗 1_1 治疗 1_2 等等。 y 行将包含值。另外，我想添加中值和线性回归线。

问题是我不断收到错误消息，因为有多个列具有相同的名称，R 认为这是将多个具有相同名称的列一起绘制的问题。

我该怎么办？我应该尝试合并同名的列吗？

Answer 1

为了单独绘制每个组和列，您可以将每个放入嵌套列表中，以便我们可以利用 purrr 函数。然后，为每个数据框创建 ggplot 个对象。

library(tidyverse)
library(ggpubr)

# First, split all columns into separate dataframes.
c_df <- df %>%
  map(function(x)
    as.data.frame(x)) %>%
  # Then, you can bind the treatment column back to those dataframes.
  map(function(x)
    cbind(x, df$treatment)) %>%
  # Remove "treatment" dataframe.
  head(-1) %>%
  # Then, split the original from treatment dataframes.
  purrr::map(function(x)
    split(x, f = str_detect(df$treatment, "treatment1")))

# Getting the names of the taxon (i.e., original column heading).
taxa_names <- names(c_df) %>%
  rep(each = 2)

# Flatten list.
c_df <- c_df %>%
  purrr::flatten() %>%
  # Rename the 2 column names in all dataframes.
  map( ~ .x %>%
         dplyr::rename(value = "x", treatment = "df$treatment"))

# Replace the list names with the taxon names.
names(c_df) <- taxa_names

# Create a plotting function.
plot_treatment <- function(z, n) {
  ggplot(data = z, aes(x = treatment, y = value)) +
    geom_point() +
    theme_bw() +
    ggtitle(n)
}

# Use the plotting function to create all of the ggplot objects.
all_plots <- c_df %>%
  purrr::map2(.y = names(c_df), .f = plot_treatment)

# Can plot in one figure.
ggarrange(all_plots[[1]],
          all_plots[[2]],
          all_plots[[3]],
          all_plots[[4]],
          ncol = 2,
          nrow = 2)

输出（示例）

数据

df <-
  
  structure(
    list(
      Enterobacteriaceae = c(
        0.60720596,
        0.17991846,
        0.76333618,
        0.34825876,
        0.60720596,
        0.17991846,
        0.76333618,
        0.34825876
      ),
      Enterobacteriaceae = c(
        0.05291531,
        0.38634377,
        0.622598,
        0.50749286,
        0.05291531,
        0.38634377,
        0.622598,
        0.50749286
      ),
      Enterobacteriaceae = c(
        0.3861723,
        0.466643,
        0.83439861,
        0.99024876,
        0.3861723,
        0.466643,
        0.83439861,
        0.99024876
      ),
      Methylococcaceae = c(
        0.49516461,
        0.16735156,
        0.77037345,
        0.50080786,
        0.49516461,
        0.16735156,
        0.77037345,
        0.50080786
      ),
      Methylococcaceae = c(
        0.18810595,
        0.7514854,
        0.05479668,
        0.11263293,
        0.18810595,
        0.7514854,
        0.05479668,
        0.11263293
      ),
      treatment = c(
        "Original Sample1",
        "Original Sample2",
        "Original Sample3",
        "Original Sample4",
        "treatment1_1",
        "treatment1_2",
        "treatment1_3",
        "treatment1_4"
      )
    ),
    class = "data.frame",
    row.names = c(NA,-8L)
  )

一般来说，ggplot最容易将数据转换成长格式，这样可以按组绘制。我创建了一些虚拟数据作为示例。我仍然不确定这是否是您正在寻找的输出。

library(tidyverse)

df %>%
  tidyr::pivot_longer(!treatment, names_to = "taxa", values_to = "value") %>%
  # You can change this to "Original" to get the other plot.
  dplyr::filter(str_detect(treatment, "treatment1")) %>%
  ggplot(aes(x = treatment, y = value, color = taxa)) +
  geom_point() +
  theme_bw()

输出

数据

df <-
  structure(
    list(
      Enterobacteriaceae = c(0.60720596, 0.17991846, 0.76333618, 0.34825876),
      Enterobacteriaceae = c(0.05291531, 0.38634377, 0.622598, 0.50749286),
      Enterobacteriaceae = c(0.3861723, 0.466643, 0.83439861, 0.99024876),
      Methylococcaceae = c(0.49516461, 0.16735156, 0.77037345, 0.50080786),
      Methylococcaceae = c(0.18810595, 0.7514854, 0.05479668, 0.11263293),
      treatment = c(
        "treatment1_1",
        "treatment1_2",
        "treatment1_3",
        "treatment1_4"
      )
    ),
    class = "data.frame",
    row.names = c(NA,-4L)
  )

在 R 中绘制具有相同名称的列

plotting columns with the same name in R

plot

median

ggplot2