python 16S 细菌丰度条形图
python bar chart plot of 16S bacterial abundance
我有这种数据(data.txt),一个制表符分隔文本文件):
#genera data1 data2
Crocinitomix 0.000103252 0
Fluviicola 2.58E-05 0
uncultured 0.000180692 0.000103252
Actibacter 2.58E-05 0
Aquibacter 0.0003 0.002503872
Litoribaculum 0.000516262 0.1
Lutibacter 2.58E-05 0
Lutimonas 5.16E-05 0.00001
Ulvibacter 0 0
uncultured 0.00240062 0
Bacteroidetes bacterium 5.16E-05 2.58E-05
bacterium 0.000129066 0
我想创建一个条形图,如图所示(从其他页面获取的示例)bar chart plot
在这种情况下,我有两个样本(data1和data2),但可能很多,可能是成百上千个类群(属),很难一一选择颜色,所以每个颜色taxa 必须自动分配。任何人都有 python 脚本,可以加载这种格式的 txt 文件并绘制它?
对不起,如果我不放任何代码,我不知道如何在python中编码,我试过QIIME,但我必须删除很多文本(例如:D_0__Bacteria;D_1__Bacteroidetes;D_2__Flavobacteriia;D_3__Flavobacteriales;D_4__Cryomorphaceae;D_5__Fluviicola) 所以我制作了一个 perl 脚本来提取属 (D_5__), 现在,我只需要绘制它 !!!
非常感谢!!!
有很多方法可以解决这个问题,这里是使用pandas
和bokeh
的解决方案:
import pandas as pd
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.core.properties import value
from bokeh.palettes import Spectral
import itertools
output_file("stacked.html")
df = pd.read_csv('bacteria.txt', sep='\t')
df.set_index('#genera', inplace=True)
samples = df.columns.values
organisms = df.index.values
# You have two rows with 'uncultured' data. I added these together.
# This may or may not be what you want.
df = df.groupby('#genera')[samples].transform('sum')
# create a color iterator
# See
# choose an appropriate pallete from
# https://docs.bokeh.org/en/latest/docs/reference/palettes.html
# if you have a large number of organisms
color_iter = itertools.cycle(Spectral[11])
colors = [next(color_iter) for organism in organisms]
# create a ColumnDataSource
data = {'samples': list(samples)}
for organism in organisms:
data[organism] = list(df.loc[organism])
source = ColumnDataSource(data=data)
# create our plot
p = figure(x_range=samples, plot_height=250, title="Species abundance",
toolbar_location=None, tools="")
p.vbar_stack(organisms, x='samples', width=0.9, source=source,
legend=[value(x) for x in organisms], color=colors)
p.xaxis.axis_label = 'Sample'
p.yaxis.axis_label = 'Value'
p.legend.location = "top_right"
p.legend.orientation = "vertical"
# Position the legend outside the plot area
#
new_legend = p.legend[0]
p.legend[0].plot = None
p.add_layout(new_legend, 'right')
show(p)
这将创建:
我有这种数据(data.txt),一个制表符分隔文本文件):
#genera data1 data2
Crocinitomix 0.000103252 0
Fluviicola 2.58E-05 0
uncultured 0.000180692 0.000103252
Actibacter 2.58E-05 0
Aquibacter 0.0003 0.002503872
Litoribaculum 0.000516262 0.1
Lutibacter 2.58E-05 0
Lutimonas 5.16E-05 0.00001
Ulvibacter 0 0
uncultured 0.00240062 0
Bacteroidetes bacterium 5.16E-05 2.58E-05
bacterium 0.000129066 0
我想创建一个条形图,如图所示(从其他页面获取的示例)bar chart plot
在这种情况下,我有两个样本(data1和data2),但可能很多,可能是成百上千个类群(属),很难一一选择颜色,所以每个颜色taxa 必须自动分配。任何人都有 python 脚本,可以加载这种格式的 txt 文件并绘制它?
对不起,如果我不放任何代码,我不知道如何在python中编码,我试过QIIME,但我必须删除很多文本(例如:D_0__Bacteria;D_1__Bacteroidetes;D_2__Flavobacteriia;D_3__Flavobacteriales;D_4__Cryomorphaceae;D_5__Fluviicola) 所以我制作了一个 perl 脚本来提取属 (D_5__), 现在,我只需要绘制它 !!!
非常感谢!!!
有很多方法可以解决这个问题,这里是使用pandas
和bokeh
的解决方案:
import pandas as pd
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.core.properties import value
from bokeh.palettes import Spectral
import itertools
output_file("stacked.html")
df = pd.read_csv('bacteria.txt', sep='\t')
df.set_index('#genera', inplace=True)
samples = df.columns.values
organisms = df.index.values
# You have two rows with 'uncultured' data. I added these together.
# This may or may not be what you want.
df = df.groupby('#genera')[samples].transform('sum')
# create a color iterator
# See
# choose an appropriate pallete from
# https://docs.bokeh.org/en/latest/docs/reference/palettes.html
# if you have a large number of organisms
color_iter = itertools.cycle(Spectral[11])
colors = [next(color_iter) for organism in organisms]
# create a ColumnDataSource
data = {'samples': list(samples)}
for organism in organisms:
data[organism] = list(df.loc[organism])
source = ColumnDataSource(data=data)
# create our plot
p = figure(x_range=samples, plot_height=250, title="Species abundance",
toolbar_location=None, tools="")
p.vbar_stack(organisms, x='samples', width=0.9, source=source,
legend=[value(x) for x in organisms], color=colors)
p.xaxis.axis_label = 'Sample'
p.yaxis.axis_label = 'Value'
p.legend.location = "top_right"
p.legend.orientation = "vertical"
# Position the legend outside the plot area
#
new_legend = p.legend[0]
p.legend[0].plot = None
p.add_layout(new_legend, 'right')
show(p)
这将创建: