使用 R，如何将多个数据框列重新组合为较少数量的新列

Question

我正在从一个非常好的在线 R/bioinformatics 课程中进行一些练习。为此，我正在与来自同名 Bioconductor 包的 'SummarizedExperiment' 对象形式的数据争论不休。行由基因名称和基因表达值组成；这些列由 9 个 ctrl（对照）样本、9 个 'drug1' 个处理过的样本和 9 个 'drug5' 个处理过的样本组成。这是 table 的样子： The task is to regroup data in this dataframe so that CTRL0_1 - CTRL0_9 are placed in a single column, named 'CTRL0'. In the same fashion, new 'DRUG1' and 'DRUG5' named columns are needed consisting of gene expression for each gene in the columns DRUG1_1 - DRUG1_9 and DRUG5_1 - DRUG5_9, respectively. Data are derived from the final question on this webpage: https://uclouvain-cbio.github.io/WSBIM1207/sec-bioinfo.html The task is to generate a ggplot like this: Instead, with my inelegant code I get this: 为了生成我的情节，我使用了以下代码：

table_gene_gp_expr <- geneCol_filter_assaySeDf %>% 
  pivot_longer(cols = CTRL0_1:CTRL0_9,
               names_to = "ctrl0",
               values_to = "ctrl0_expr") %>% 
  pivot_longer(cols = DRUG1_1:DRUG1_9,
               names_to = "drug1",
               values_to = "drug1_expr") %>% 
  pivot_longer(cols = DRUG5_1:DRUG5_9,
               names_to = "drug5",
               values_to = "drug5_expr") %>% 
  pivot_longer(cols = c(ctrl0_expr, drug1_expr, drug5_expr),
               names_to = "group",
               values_to = "gene_expression") %>% 
  select(gene, group, gene_expression)

我使用了连续的 pivot_longer() 命令，但我想知道是否有更有效的方法来生成低于我用于 MY ggplot 的 table。请注意我的ggplot 中的数据和正确的ggplot 是不同的，我无法解释。我使用的 table 的 head() 看起来像这样：我可以 post 我写的代码生成第一个 table 的数据帧 [=25] =] 显示是否真的需要。

Answer 1

给出这样的示例数据：

tibble(gene = c("gene1", "gene2"),
   CTRL0_1 = 1:2,
   CTRL0_2 = 3:4,
   DRUG1_1 = 5:6,
   DRUG1_2 = 7:8,
   DRUG5_1 = 9:10) -> test

这里有两种语法变体，用于旋转所有数据列，然后按组拆分它们。如果这些列中的数据具有不同的类型，则您需要采用不同的方法。

test %>%
  pivot_longer(cols = -gene) %>%
  separate(name, c("group", "obs"), convert = TRUE) %>%
  pivot_wider(names_from = group, values_from = value)

test %>%
  pivot_longer(cols = -gene, names_to = c("group", "obs"), names_sep = "_") %>%
  pivot_wider(names_from = group, values_from = value)

使用 R，如何将多个数据框列重新组合为较少数量的新列

Using R, how to regroup multiple dataframe columns into a smaller number of new columns

r

ggplot2

dataframe

tidyverse

data-wrangling