How can I fix this error: "BiopythonWarning: Partial codon, len(sequence) not a multiple of three."?
How can I fix this error: "BiopythonWarning: Partial codon, len(sequence) not a multiple of three."?
对于一项作业,我需要编写一段代码,将 rna 序列从 fasta 文件翻译成氨基酸序列。但是,我不断收到以下警告消息:
“ BiopythonWarning:部分密码子,len(sequence) 不是三的倍数。明确 trim 序列或在翻译前添加尾随 N。这可能在将来成为错误。”
我尝试添加尾随 N,但它似乎仍然不起作用。我认为我的代码可能有错误,但我不确定在哪里。
这是我的代码:
from Bio.Seq import Seq
from Bio import SeqIO
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []
for record in seq_records:
# starting from nucleotide 1
if len(record) %3 ==0:
amino_acids1.append(record.translate())
elif (len(record)+1) %3 ==0:
recordN = record + Seq('N')
amino_acids1.append(recordN.translate())
elif (len(record)+2) %3 ==0:
recordNN = record + Seq('N') + Seq('N')
amino_acids1.append(recordNN.translate())
print("FIRST")
print(amino_acids1)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids1, p_file, 'fasta')
# starting from nucleotide 2
record2 = record[1:]
if len(record2) %3 ==0:
amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
record2N = record + Seq('N')
amino_acids2.append(record2N.translate())
elif (len(record2)+2) %3 ==0:
record2NN = record + Seq('N') + Seq('N')
amino_acids2.append(record2NN.translate() )
print("SECOND")
print(amino_acids2)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids2, p_file, 'fasta')
# starting from nucleotide 3
record3 = record[2:]
if len(record3) %3 ==0:
amino_acids3.append(record3.translate())
elif (len(record3)+1) %3 ==0:
record3N = record + Seq('N')
amino_acids3.append(record3N.translate())
elif (len(record3)+2) %3 ==0:
record3NN = record + Seq('N') + Seq('N')
amino_acids3.append(record3NN.translate())
print("THIRD")
print(amino_acids3)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids3, p_file, 'fasta')
通常,这将为 fasta 文件中的每个序列提供 3 种可能的翻译。但是,输出似乎不正确。
这些是前 3 行,应该是 fasta 文件中第一个序列的 3 个不同翻译:
第一个
[SeqRecord(seq=Seq('GAKRTDRTSVINKLSLLYTSCETIDCYIFFL', HasStopCodon(ExtendedIUPACProtein(), '')), id='', name='', description='', dbxrefs=[])]
第二
[SeqRecord(seq=Seq('GAKRTDRTSVINKLSLLYTSCETIDCYIFFL', HasStopCodon(ExtendedIUPACProtein(), '')), id='', name='', description='', dbxrefs=[])]
第三
[SeqRecord(seq=Seq('CQKNSDVVVGHQTVVALHVMRNDLLYLFP', HasStopCodon(ExtendedIUPACProtein(), '')), id='', name='', description='', dbxrefs=[])]
我不知道哪里错了,但这绝对不是一个正确的翻译。如果你知道我在哪里犯了错误,我将非常感谢你的帮助!!
您的方法可能有效,但您的代码中存在复制和粘贴错误:
record2 = record[1:]
if len(record2) %3 ==0:
amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
record2N = record + Seq('N')
注意最后一行的record
应该是record2
。你至少犯了四次这个错误。我相信代码 @Chris_Rands 会指导您对问题有宝贵的见解,例如也翻译反向补充,但我不推荐该代码中的 pad_seq()
函数。
下面是 pad_seq()
的返工,已集成到您的代码中:
from Bio.Seq import Seq
from Bio import SeqIO
def pad_seq(sequence):
""" Pad sequence to multiple of 3 with N """
remainder = len(sequence) % 3
return sequence if remainder == 0 else sequence + Seq('N' * (3 - remainder))
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []
for record in seq_records:
# starting from nucleotide 1
amino_acids1.append(pad_seq(record).translate())
print("FIRST")
print(amino_acids1)
# ...
# starting from nucleotide 2
record2 = record[1:]
amino_acids2.append(pad_seq(record2).translate())
print("SECOND")
print(amino_acids2)
# ...
# starting from nucleotide 3
record3 = record[2:]
amino_acids3.append(pad_seq(record3).translate())
print("THIRD")
print(amino_acids3)
# ...
对于一项作业,我需要编写一段代码,将 rna 序列从 fasta 文件翻译成氨基酸序列。但是,我不断收到以下警告消息: “ BiopythonWarning:部分密码子,len(sequence) 不是三的倍数。明确 trim 序列或在翻译前添加尾随 N。这可能在将来成为错误。”
我尝试添加尾随 N,但它似乎仍然不起作用。我认为我的代码可能有错误,但我不确定在哪里。
这是我的代码:
from Bio.Seq import Seq
from Bio import SeqIO
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []
for record in seq_records:
# starting from nucleotide 1
if len(record) %3 ==0:
amino_acids1.append(record.translate())
elif (len(record)+1) %3 ==0:
recordN = record + Seq('N')
amino_acids1.append(recordN.translate())
elif (len(record)+2) %3 ==0:
recordNN = record + Seq('N') + Seq('N')
amino_acids1.append(recordNN.translate())
print("FIRST")
print(amino_acids1)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids1, p_file, 'fasta')
# starting from nucleotide 2
record2 = record[1:]
if len(record2) %3 ==0:
amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
record2N = record + Seq('N')
amino_acids2.append(record2N.translate())
elif (len(record2)+2) %3 ==0:
record2NN = record + Seq('N') + Seq('N')
amino_acids2.append(record2NN.translate() )
print("SECOND")
print(amino_acids2)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids2, p_file, 'fasta')
# starting from nucleotide 3
record3 = record[2:]
if len(record3) %3 ==0:
amino_acids3.append(record3.translate())
elif (len(record3)+1) %3 ==0:
record3N = record + Seq('N')
amino_acids3.append(record3N.translate())
elif (len(record3)+2) %3 ==0:
record3NN = record + Seq('N') + Seq('N')
amino_acids3.append(record3NN.translate())
print("THIRD")
print(amino_acids3)
with open('rna_out.fasta', 'w') as p_file:
SeqIO.write(amino_acids3, p_file, 'fasta')
通常,这将为 fasta 文件中的每个序列提供 3 种可能的翻译。但是,输出似乎不正确。
这些是前 3 行,应该是 fasta 文件中第一个序列的 3 个不同翻译:
第一个 [SeqRecord(seq=Seq('GAKRTDRTSVINKLSLLYTSCETIDCYIFFL', HasStopCodon(ExtendedIUPACProtein(), '')), id='', name='', description='', dbxrefs=[])] 第二 [SeqRecord(seq=Seq('GAKRTDRTSVINKLSLLYTSCETIDCYIFFL', HasStopCodon(ExtendedIUPACProtein(), '')), id='', name='', description='', dbxrefs=[])] 第三 [SeqRecord(seq=Seq('CQKNSDVVVGHQTVVALHVMRNDLLYLFP', HasStopCodon(ExtendedIUPACProtein(), '')), id='', name='', description='', dbxrefs=[])]
我不知道哪里错了,但这绝对不是一个正确的翻译。如果你知道我在哪里犯了错误,我将非常感谢你的帮助!!
您的方法可能有效,但您的代码中存在复制和粘贴错误:
record2 = record[1:]
if len(record2) %3 ==0:
amino_acids2.append(record2.translate())
elif (len(record2)+1) %3 ==0:
record2N = record + Seq('N')
注意最后一行的record
应该是record2
。你至少犯了四次这个错误。我相信代码 @Chris_Rands 会指导您对问题有宝贵的见解,例如也翻译反向补充,但我不推荐该代码中的 pad_seq()
函数。
下面是 pad_seq()
的返工,已集成到您的代码中:
from Bio.Seq import Seq
from Bio import SeqIO
def pad_seq(sequence):
""" Pad sequence to multiple of 3 with N """
remainder = len(sequence) % 3
return sequence if remainder == 0 else sequence + Seq('N' * (3 - remainder))
seq_records = SeqIO.parse('rna.fasta', 'fasta')
amino_acids1 = []
amino_acids2 = []
amino_acids3 = []
for record in seq_records:
# starting from nucleotide 1
amino_acids1.append(pad_seq(record).translate())
print("FIRST")
print(amino_acids1)
# ...
# starting from nucleotide 2
record2 = record[1:]
amino_acids2.append(pad_seq(record2).translate())
print("SECOND")
print(amino_acids2)
# ...
# starting from nucleotide 3
record3 = record[2:]
amino_acids3.append(pad_seq(record3).translate())
print("THIRD")
print(amino_acids3)
# ...