Phyloseq,如何通过merge_samples获得相对丰度?
Phyloseq, how obtain the relative Abundance by merge_samples?
我正在尝试使用 Phyloseq 包的 merge_sample 选项获取相对丰度。
当我计算所有样本的每个门(我将以 GlobalPatterns 为例)的平均值时;我的意思是,Globalpaters 有 26 个样本,所以我做了类似
的东西
library(phyloseq)
library(plyr)
data(GlobalPatterns)
TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
OTUg <- otu_table(PGroup)
TAXg <- tax_table(PGroup)[,"Phylum"]
AverageD <- as.data.frame(rowMeans(OTUg))
names(AverageD) <- c("Mean")
GTable <- merge(TAXg, AverageD, by=0, all=TRUE)
GTable$Row.names = NULL
GTable <- GTable[order(desc(GTable$Mean)),]
head(GTable)
我得到类似的东西:
Phylum Mean
1 Proteobacteria 29.45550
2 Firmicutes 18.87905
3 Bacteroidetes 17.34374
4 Cyanobacteria 13.70639
5 Actinobacteria 8.93446
6....... More.....
我觉得还行吧!!!!
但是当我选择法师时 merge_samples( by: SampleType):
ps <- tax_glom(GlobalPatterns, "Phylum")
ps0 <- transform_sample_counts(ps, function(x)100* x / sum(x))
ps1 <- merge_samples(ps0, "SampleType")
ps2 <- transform_sample_counts(ps1, function(x)100* x / sum(x))
ps3 <- ps2
otu_table(ps3) <- t(otu_table(ps3)) # transpose the matrix otus !!!
OTUg <- otu_table(ps3)
TAXg <- tax_table(ps3)[,"Phylum"]
GTable <- merge(TAXg, OTUg, by=0, all=TRUE)
GTable$Row.names = NULL
GTable$Mean=rowMeans(GTable[,-c(1)], na.rm=TRUE)
GTable <- GTable[order(desc(GTable$Mean)),]
head(GTable)
我得到相同的税,但平均列中的百分比不同:
Phylum Feces Freshwater Freshwater Mock Ocean Sediment Skin Soil Tongue Mean
1 Proteobacteria 1.58 16.71 18.61 20.10 38.00 71.03 31.98 32.66 44.49 30.57
2 Firmicutes 54.82 0.12 0.65 41.42 0.08 2.53 30.67 0.64 21.67 16.96
3 Bacteroidetes 35.23 11.92 5.07 24.97 31.17 7.01 9.09 9.90 12.28 16.29
4 Cyanobacteria 2.63 30.17 62.57 0.16 19.18 3.24 4.65 0.97 6.61 14.46
5 Actinobacteria 3.47 37.11 1.74 8.39 5.12 1.04 16.78 9.99 7.49 10.13
在这一点上,通过 SampleType 的 merge_samples,每一列(样本)将 glom 类群,每个样本中每个门的百分比将发生变化(粪便淡水淡水...),我理解,但每个门的一般平均值必须相同,即使我合并样本,在这种情况下,平均值不同(Proteobacteria 30.57,Firmicutes 16.9,Bacteroidetes 16.29 .......)。
任何解决方案或建议????
谢谢
对于第一部分,您将对所有样本进行均值计算。在第二个中,您采用的是分组均值的均值。只有当每组的观察数相同时,这两者才等价。
例如:
# equal n for each group
abundance = seq(0.1,0.6,by=0.1)
group = rep(letters[1:3],each=2)
mean(tapply(abundance,group,mean)) == mean(abundance)
[1] TRUE
# unequal n
abundance = seq(0.1,0.6,by=0.1)
group = rep(letters[1:3],1:3)
mean(tapply(abundance,group,mean)) == mean(abundance)
[1] FALSE
每个样本类型的 n 不同
TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
SampleType = sample_data(PGroup)$SampleType
table(SampleType)
SampleType
Feces Freshwater Freshwater (creek) Mock
4 2 3 3
Ocean Sediment (estuary) Skin Soil
3 3 3 3
Tongue
2
要在样本中获得相同的平均丰度,您需要找到每个 SampleType 的平均丰度,然后取平均值:
mean_PGroup = sapply(levels(SampleType),function(i){
rowMeans(otu_table(PGroup)[,SampleType==i])
})
phy = tax_table(PGroup)[rownames(mean_PGroup ),"Phylum"]
rownames(mean_PGroup) = phy
head(sort(rowMeans(mean_PGroup),decreasing=TRUE))
Proteobacteria Firmicutes Bacteroidetes Cyanobacteria Actinobacteria
30.572773 16.956254 16.293286 14.463643 10.126875
Verrucomicrobia
2.774216
我正在尝试使用 Phyloseq 包的 merge_sample 选项获取相对丰度。
当我计算所有样本的每个门(我将以 GlobalPatterns 为例)的平均值时;我的意思是,Globalpaters 有 26 个样本,所以我做了类似
的东西library(phyloseq)
library(plyr)
data(GlobalPatterns)
TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
OTUg <- otu_table(PGroup)
TAXg <- tax_table(PGroup)[,"Phylum"]
AverageD <- as.data.frame(rowMeans(OTUg))
names(AverageD) <- c("Mean")
GTable <- merge(TAXg, AverageD, by=0, all=TRUE)
GTable$Row.names = NULL
GTable <- GTable[order(desc(GTable$Mean)),]
head(GTable)
我得到类似的东西:
Phylum Mean
1 Proteobacteria 29.45550
2 Firmicutes 18.87905
3 Bacteroidetes 17.34374
4 Cyanobacteria 13.70639
5 Actinobacteria 8.93446
6....... More.....
我觉得还行吧!!!!
但是当我选择法师时 merge_samples( by: SampleType):
ps <- tax_glom(GlobalPatterns, "Phylum")
ps0 <- transform_sample_counts(ps, function(x)100* x / sum(x))
ps1 <- merge_samples(ps0, "SampleType")
ps2 <- transform_sample_counts(ps1, function(x)100* x / sum(x))
ps3 <- ps2
otu_table(ps3) <- t(otu_table(ps3)) # transpose the matrix otus !!!
OTUg <- otu_table(ps3)
TAXg <- tax_table(ps3)[,"Phylum"]
GTable <- merge(TAXg, OTUg, by=0, all=TRUE)
GTable$Row.names = NULL
GTable$Mean=rowMeans(GTable[,-c(1)], na.rm=TRUE)
GTable <- GTable[order(desc(GTable$Mean)),]
head(GTable)
我得到相同的税,但平均列中的百分比不同:
Phylum Feces Freshwater Freshwater Mock Ocean Sediment Skin Soil Tongue Mean
1 Proteobacteria 1.58 16.71 18.61 20.10 38.00 71.03 31.98 32.66 44.49 30.57
2 Firmicutes 54.82 0.12 0.65 41.42 0.08 2.53 30.67 0.64 21.67 16.96
3 Bacteroidetes 35.23 11.92 5.07 24.97 31.17 7.01 9.09 9.90 12.28 16.29
4 Cyanobacteria 2.63 30.17 62.57 0.16 19.18 3.24 4.65 0.97 6.61 14.46
5 Actinobacteria 3.47 37.11 1.74 8.39 5.12 1.04 16.78 9.99 7.49 10.13
在这一点上,通过 SampleType 的 merge_samples,每一列(样本)将 glom 类群,每个样本中每个门的百分比将发生变化(粪便淡水淡水...),我理解,但每个门的一般平均值必须相同,即使我合并样本,在这种情况下,平均值不同(Proteobacteria 30.57,Firmicutes 16.9,Bacteroidetes 16.29 .......)。
任何解决方案或建议????
谢谢
对于第一部分,您将对所有样本进行均值计算。在第二个中,您采用的是分组均值的均值。只有当每组的观察数相同时,这两者才等价。
例如:
# equal n for each group
abundance = seq(0.1,0.6,by=0.1)
group = rep(letters[1:3],each=2)
mean(tapply(abundance,group,mean)) == mean(abundance)
[1] TRUE
# unequal n
abundance = seq(0.1,0.6,by=0.1)
group = rep(letters[1:3],1:3)
mean(tapply(abundance,group,mean)) == mean(abundance)
[1] FALSE
每个样本类型的 n 不同
TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
SampleType = sample_data(PGroup)$SampleType
table(SampleType)
SampleType
Feces Freshwater Freshwater (creek) Mock
4 2 3 3
Ocean Sediment (estuary) Skin Soil
3 3 3 3
Tongue
2
要在样本中获得相同的平均丰度,您需要找到每个 SampleType 的平均丰度,然后取平均值:
mean_PGroup = sapply(levels(SampleType),function(i){
rowMeans(otu_table(PGroup)[,SampleType==i])
})
phy = tax_table(PGroup)[rownames(mean_PGroup ),"Phylum"]
rownames(mean_PGroup) = phy
head(sort(rowMeans(mean_PGroup),decreasing=TRUE))
Proteobacteria Firmicutes Bacteroidetes Cyanobacteria Actinobacteria
30.572773 16.956254 16.293286 14.463643 10.126875
Verrucomicrobia
2.774216