按字符聚合列

aggregate columns by character

您好,我想聚合多个列。

d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1", 
"k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L, 
1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L, 
1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L, 
1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L, 
1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L, 
1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L, 
1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"), 
    SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889, 
    11.5555555555556, 3.35978835978836)), .Names = c("Gene", 
"phylum", "class", "order", "family", "genus", "species", "SampleA", 
"SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
), class = "data.frame")

这在要聚合的输入数据框中

   Gene     phylum      class   order      family    genus           species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0
  SampleCtrl
  3.99
 11.56
  3.36

最后我想要的是包含所有列的一行。在这种情况下,它看起来像这样(我们可以删除基因列)。

    phylum   class order  family  genus  species SampleA SampleB SampleCtrl
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0     15.6
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus Unknown       0       0     3.36

请注意,这是一个非常简单的示例。我在原始数据框中有 20 个样本和 500 多个物种。

这是一个dplyr解决方案:

library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
summarise_if(is.numeric, sum)    
Groups: phylum, class, order, family, genus [?]

      phylum      class   order      family    genus           species SampleA SampleB SampleCtrl
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979

假设样本列是数字列而其他列不是,并且所需的聚合是将每个样本列按其他列(基因除外)分组求和:

j <- which(sapply(d, is.numeric))
aggregate(d[j], d[-c(1, j)], sum)

给予:

      phylum      class   order      family    genus           species SampleA
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0
  SampleB SampleCtrl
1       0  15.544444
2       0   3.359788

如果示例列的名称中都包含 Sample 而其他列没有,则另一种可能性是使用它代替上面的第一行:

j <- grep("Sample", names(d))

或者如果以上假设都不成立,那么如果我们知道样本列是最后 3 列,那么:

j <- seq(to = ncol(d), length = 3)

更新: 修复并添加了两个备选方案。