计数组之间共享的值

Count values shared between groups

这是一些虚拟数据:

class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)

如果组内的任何重复 'location' 和 'type' 为 0,我想去除任何 OTU,因为我对组内所有重复之间共享的 OTU 感兴趣。

我想计算两件事。 一:组 'location' 和类型之间共享的所有 OTU 的 'value' 丰度百分比(丰度) 二:统计每个class(otu.freq)

中共享的OTU数量

需要注意的是,我希望 OTU class由 'class' 化,而不是 OTU 名称(因为它没有意义)。

预期输出:

   class location type  abundance  otu.freq
    ab        a    a      79        2
    av        a    a      21        1
    ab        b    b     100        1
    ab        c    c     100        1
    ad        d    c     100        1
    ab        d    d      24        2         
    ad        d    d      76        2

我有一个更大的数据框,并尝试了使用 dplyr 的建议,但我 运行 RAM 不足,所以我不知道它是否有效。

下面@Akron 提供的解决方案不计算丰度为 0 的情况,但它没有从该组内的其他复制品中去除该 OTU。如果任何 OTU 的丰度为 0,那么它不会在该组之间共享,我需要从丰度和 otu.freq 计算中完全扣除它。

library(dplyr)    
so_many_shared3<-datafr1 %>% 
      group_by(class, location, type) %>% 
      summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))


   class location type  abundance  otu.freq
1    ab        a    a  4.3859649     2
2    ab        b    b 87.7192982     1
3    ab        c    c  0.2923977     1
4    ab        d    d  1.4619883     2
5    ad        b    b  0.0000000     0
6    ad        d    c  0.2923977     1
7    ad        d    d  4.6783626     2
8    av        a    a  1.1695906     1

您的聚合函数有误。如果要统计otu出现的频率,应该把otu放在“~”号前。之后,您可以使用 plyr

中的 join 函数合并它们
abund_shared_freq<-aggregate(otu~class+location+type,datafr1,length)
library(plyr)
join(abund_shared, abund_shared_freq, by=c("class", "location","type"), type="left")

输出:

  class location type  abundance otu
1    ab        a    a  4.3859649   2
2    ab        b    b 87.7192982   2
3    ab        c    c  0.2923977   2
4    ab        d    d  1.4619883   2
5    ad        b    b  0.0000000   1
6    ad        d    c  0.2923977   1
7    ad        d    d  4.6783626   2
8    av        a    a  1.1695906   1

您可以使用 data.table

一步完成
library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance = 
               sum(value)/val*100, otu.freq = .N), 
               by = .(class, location, type)]

或使用dplyr

library(dplyr)
datafr1 %>% 
     group_by(class, location, type) %>% 
     summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
 #   class location type  abundance otu.freq
 #1    ab        a    a  4.3859649        2
 #2    ab        b    b 87.7192982        2
 #3    ab        c    c  0.2923977        2
 #4    ab        d    d  1.4619883        2
 #5    ad        b    b  0.0000000        1
 #6    ad        d    c  0.2923977        1
 #7    ad        d    d  4.6783626        2
 #8    av        a    a  1.1695906        1

更新

根据新标准,我正在更新 OP (@K.Brannen)

建议的代码
  datafr1 %>%
       group_by(class, location, type) %>% 
       summarise(abundance=sum(value)/sum(datafr1[['value']])*100, 
             otu.freq=sum(value !=0)) 

更新2

基于更新后的预期结果

  datafr1 %>%
       filter(value!=0) %>% 
       group_by(location, type) %>% 
       mutate(value1=sum(value)) %>% 
       group_by(class, add=TRUE) %>% 
       summarise(abundance=round(100*sum(value)/unique(value1)), 
                         otu.freq=n())
  #    location type class abundance otu.freq
  #1        a    a    ab        79        2
  #2        a    a    av        21        1
  #3        b    b    ab       100        1
  #4        c    c    ab       100        1
  #5        d    c    ad       100        1
  #6        d    d    ab        24        2
  #7        d    d    ad        76        2