如何计算大型 FASTA 文件中包含的序列的氨基酸组成百分比
How do I calculate percentage amino acid composition of sequences contained in a large FASTA file
我想计算 FASTA 文件中单独包含的每个序列的氨基酸组成,但我很难做到这一点。我知道我可以使用下面的代码来完成,但这需要我分别输入每个序列,而不是将 FASTA 文件作为一个整体并以这种方式计算。
from Bio.SeqUtils.ProtParam import ProteinAnalysis
X = ProteinAnalysis("MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFLRILPDGTVDGT"
"RDRSDQHIQLQLSAESVGEVYIKSTETGQYLAMDTSGLLYGSQTPSEEC"
"LFLERLEENHYNTYTSKKHAEKNWFVGLKKNGSCKRGPRTHYGQKAILF"
"LPLPV")
print(X.count_amino_acids()['A'])
print(X.count_amino_acids()['E'])
print("%0.2f" % X.get_amino_acids_percent()['K'])
print("%0.2f" % X.get_amino_acids_percent()['L'])
print("%0.2f" % X.molecular_weight())
print("%0.2f" % X.aromaticity())
print("%0.2f" % X.instability_index())
print("%0.2f" % X.isoelectric_point())
sec_struc = X.secondary_structure_fraction()
print("%0.2f" % sec_struc[0])
epsilon_prot = X.molar_extinction_coefficient()
print(epsilon_prot[0])
print(epsilon_prot[1])
您只需要使用 SeqIO.parse()
:
读取序列的 FASTA 文件
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis
for record in SeqIO.parse('myfasta.fa', 'fasta'):
X = ProteinAnalysis(str(record.seq))
print('\n### Results for record: {} ###'.format(record.id))
print(X.count_amino_acids()['A'])
print(X.count_amino_acids()['E'])
print("%0.2f" % X.get_amino_acids_percent()['K'])
print("%0.2f" % X.get_amino_acids_percent()['L'])
print("%0.2f" % X.molecular_weight())
print("%0.2f" % X.aromaticity())
print("%0.2f" % X.instability_index())
print("%0.2f" % X.isoelectric_point())
sec_struc = X.secondary_structure_fraction()
print("%0.2f" % sec_struc[0])
epsilon_prot = X.molar_extinction_coefficient()
print(epsilon_prot[0])
print(epsilon_prot[1])
我认为您需要 FastaIO
模块中的内容,例如:
from Bio.SeqUtils.ProtParam import ProteinAnalysis
from Bio.SeqIO import FastaIO
with open('myfile.fasta') as fd:
for name, sequence in FastaIO.SimpleFastaParser(fd):
X = ProteinAnalysis(sequence)
print(name, X.count_amino_acids()['A'])
以及你想计算的任何东西
我想计算 FASTA 文件中单独包含的每个序列的氨基酸组成,但我很难做到这一点。我知道我可以使用下面的代码来完成,但这需要我分别输入每个序列,而不是将 FASTA 文件作为一个整体并以这种方式计算。
from Bio.SeqUtils.ProtParam import ProteinAnalysis
X = ProteinAnalysis("MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFLRILPDGTVDGT"
"RDRSDQHIQLQLSAESVGEVYIKSTETGQYLAMDTSGLLYGSQTPSEEC"
"LFLERLEENHYNTYTSKKHAEKNWFVGLKKNGSCKRGPRTHYGQKAILF"
"LPLPV")
print(X.count_amino_acids()['A'])
print(X.count_amino_acids()['E'])
print("%0.2f" % X.get_amino_acids_percent()['K'])
print("%0.2f" % X.get_amino_acids_percent()['L'])
print("%0.2f" % X.molecular_weight())
print("%0.2f" % X.aromaticity())
print("%0.2f" % X.instability_index())
print("%0.2f" % X.isoelectric_point())
sec_struc = X.secondary_structure_fraction()
print("%0.2f" % sec_struc[0])
epsilon_prot = X.molar_extinction_coefficient()
print(epsilon_prot[0])
print(epsilon_prot[1])
您只需要使用 SeqIO.parse()
:
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis
for record in SeqIO.parse('myfasta.fa', 'fasta'):
X = ProteinAnalysis(str(record.seq))
print('\n### Results for record: {} ###'.format(record.id))
print(X.count_amino_acids()['A'])
print(X.count_amino_acids()['E'])
print("%0.2f" % X.get_amino_acids_percent()['K'])
print("%0.2f" % X.get_amino_acids_percent()['L'])
print("%0.2f" % X.molecular_weight())
print("%0.2f" % X.aromaticity())
print("%0.2f" % X.instability_index())
print("%0.2f" % X.isoelectric_point())
sec_struc = X.secondary_structure_fraction()
print("%0.2f" % sec_struc[0])
epsilon_prot = X.molar_extinction_coefficient()
print(epsilon_prot[0])
print(epsilon_prot[1])
我认为您需要 FastaIO
模块中的内容,例如:
from Bio.SeqUtils.ProtParam import ProteinAnalysis
from Bio.SeqIO import FastaIO
with open('myfile.fasta') as fd:
for name, sequence in FastaIO.SimpleFastaParser(fd):
X = ProteinAnalysis(sequence)
print(name, X.count_amino_acids()['A'])
以及你想计算的任何东西