计数组之间共享的值
Count values shared between groups
这是一些虚拟数据:
class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)
如果组内的任何重复 'location' 和 'type' 为 0,我想去除任何 OTU,因为我对组内所有重复之间共享的 OTU 感兴趣。
我想计算两件事。
一:组 'location' 和类型之间共享的所有 OTU 的 'value' 丰度百分比(丰度)
二:统计每个class(otu.freq)
中共享的OTU数量
需要注意的是,我希望 OTU class由 'class' 化,而不是 OTU 名称(因为它没有意义)。
预期输出:
class location type abundance otu.freq
ab a a 79 2
av a a 21 1
ab b b 100 1
ab c c 100 1
ad d c 100 1
ab d d 24 2
ad d d 76 2
我有一个更大的数据框,并尝试了使用 dplyr 的建议,但我 运行 RAM 不足,所以我不知道它是否有效。
下面@Akron 提供的解决方案不计算丰度为 0 的情况,但它没有从该组内的其他复制品中去除该 OTU。如果任何 OTU 的丰度为 0,那么它不会在该组之间共享,我需要从丰度和 otu.freq 计算中完全扣除它。
library(dplyr)
so_many_shared3<-datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))
class location type abundance otu.freq
1 ab a a 4.3859649 2
2 ab b b 87.7192982 1
3 ab c c 0.2923977 1
4 ab d d 1.4619883 2
5 ad b b 0.0000000 0
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
您的聚合函数有误。如果要统计otu出现的频率,应该把otu放在“~”号前。之后,您可以使用 plyr
库
中的 join
函数合并它们
abund_shared_freq<-aggregate(otu~class+location+type,datafr1,length)
library(plyr)
join(abund_shared, abund_shared_freq, by=c("class", "location","type"), type="left")
输出:
class location type abundance otu
1 ab a a 4.3859649 2
2 ab b b 87.7192982 2
3 ab c c 0.2923977 2
4 ab d d 1.4619883 2
5 ad b b 0.0000000 1
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
您可以使用 data.table
一步完成
library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance =
sum(value)/val*100, otu.freq = .N),
by = .(class, location, type)]
或使用dplyr
library(dplyr)
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
# class location type abundance otu.freq
#1 ab a a 4.3859649 2
#2 ab b b 87.7192982 2
#3 ab c c 0.2923977 2
#4 ab d d 1.4619883 2
#5 ad b b 0.0000000 1
#6 ad d c 0.2923977 1
#7 ad d d 4.6783626 2
#8 av a a 1.1695906 1
更新
根据新标准,我正在更新 OP (@K.Brannen)
建议的代码
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100,
otu.freq=sum(value !=0))
更新2
基于更新后的预期结果
datafr1 %>%
filter(value!=0) %>%
group_by(location, type) %>%
mutate(value1=sum(value)) %>%
group_by(class, add=TRUE) %>%
summarise(abundance=round(100*sum(value)/unique(value1)),
otu.freq=n())
# location type class abundance otu.freq
#1 a a ab 79 2
#2 a a av 21 1
#3 b b ab 100 1
#4 c c ab 100 1
#5 d c ad 100 1
#6 d d ab 24 2
#7 d d ad 76 2
这是一些虚拟数据:
class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)
如果组内的任何重复 'location' 和 'type' 为 0,我想去除任何 OTU,因为我对组内所有重复之间共享的 OTU 感兴趣。
我想计算两件事。 一:组 'location' 和类型之间共享的所有 OTU 的 'value' 丰度百分比(丰度) 二:统计每个class(otu.freq)
中共享的OTU数量需要注意的是,我希望 OTU class由 'class' 化,而不是 OTU 名称(因为它没有意义)。
预期输出:
class location type abundance otu.freq
ab a a 79 2
av a a 21 1
ab b b 100 1
ab c c 100 1
ad d c 100 1
ab d d 24 2
ad d d 76 2
我有一个更大的数据框,并尝试了使用 dplyr
下面@Akron 提供的解决方案不计算丰度为 0 的情况,但它没有从该组内的其他复制品中去除该 OTU。如果任何 OTU 的丰度为 0,那么它不会在该组之间共享,我需要从丰度和 otu.freq 计算中完全扣除它。
library(dplyr)
so_many_shared3<-datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))
class location type abundance otu.freq
1 ab a a 4.3859649 2
2 ab b b 87.7192982 1
3 ab c c 0.2923977 1
4 ab d d 1.4619883 2
5 ad b b 0.0000000 0
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
您的聚合函数有误。如果要统计otu出现的频率,应该把otu放在“~”号前。之后,您可以使用 plyr
库
join
函数合并它们
abund_shared_freq<-aggregate(otu~class+location+type,datafr1,length)
library(plyr)
join(abund_shared, abund_shared_freq, by=c("class", "location","type"), type="left")
输出:
class location type abundance otu
1 ab a a 4.3859649 2
2 ab b b 87.7192982 2
3 ab c c 0.2923977 2
4 ab d d 1.4619883 2
5 ad b b 0.0000000 1
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
您可以使用 data.table
library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance =
sum(value)/val*100, otu.freq = .N),
by = .(class, location, type)]
或使用dplyr
library(dplyr)
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
# class location type abundance otu.freq
#1 ab a a 4.3859649 2
#2 ab b b 87.7192982 2
#3 ab c c 0.2923977 2
#4 ab d d 1.4619883 2
#5 ad b b 0.0000000 1
#6 ad d c 0.2923977 1
#7 ad d d 4.6783626 2
#8 av a a 1.1695906 1
更新
根据新标准,我正在更新 OP (@K.Brannen)
建议的代码 datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100,
otu.freq=sum(value !=0))
更新2
基于更新后的预期结果
datafr1 %>%
filter(value!=0) %>%
group_by(location, type) %>%
mutate(value1=sum(value)) %>%
group_by(class, add=TRUE) %>%
summarise(abundance=round(100*sum(value)/unique(value1)),
otu.freq=n())
# location type class abundance otu.freq
#1 a a ab 79 2
#2 a a av 21 1
#3 b b ab 100 1
#4 c c ab 100 1
#5 d c ad 100 1
#6 d d ab 24 2
#7 d d ad 76 2