KeyError: 'mtD' when 'mtD' is nowhere to be found in the relevant code
KeyError: 'mtD' when 'mtD' is nowhere to be found in the relevant code
我正在使用一个简单的函数将 DNA 序列转换为氨基酸序列。从高层次上看,代码看起来很不错,但每当我 运行 程序时,我都会收到错误 KeyError: 'mtD'
,此错误的来源显然在第 26 行(if table[seq[i:i+3]] == "_" :
) .唯一一次在我的程序中提到 'mtD' 是当我只是将我的数据集打印到控制台时,这使得问题更加令人费解。我的代码如下所示。
#Creating the protein sequence column for the data
Protein_Sequence = []
#dna to protein sequence function
def translate11(seq):
table = {"TTT" : "F", "CTT" : "L", "ATT" : "I", "GTT" : "V",
"TTC" : "F", "CTC" : "L", "ATC" : "I", "GTC" : "V",
"TTA" : "L", "CTA" : "L", "ATA" : "I", "GTA" : "V",
"TTG" : "L", "CTG" : "L", "ATG" : "M", "GTG" : "V",
"TCT" : "S", "CCT" : "P", "ACT" : "T", "GCT" : "A",
"TCC" : "S", "CCC" : "P", "ACC" : "T", "GCC" : "A",
"TCA" : "S", "CCA" : "P", "ACA" : "T", "GCA" : "A",
"TCG" : "S", "CCG" : "P", "ACG" : "T", "GCG" : "A",
"TAT" : "Y", "CAT" : "H", "AAT" : "N", "GAT" : "D",
"TAC" : "Y", "CAC" : "H", "AAC" : "N", "GAC" : "D",
"TAA" : "_", "CAA" : "Q", "AAA" : "K", "GAA" : "E",
"TAG" : "_", "CAG" : "Q", "AAG" : "K", "GAG" : "E",
"TGT" : "C", "CGT" : "R", "AGT" : "S", "GGT" : "G",
"TGC" : "C", "CGC" : "R", "AGC" : "S", "GGC" : "G",
"TGA" : "_", "CGA" : "R", "AGA" : "R", "GGA" : "G",
"TGG" : "W", "CGG" : "R", "AGG" : "R", "GGG" : "G"
}
pro_sequence =" "
for i in range(0, len(seq)-(3+len(seq)%3), 3):
if table[seq[i:i+3]] == "_" :
break
pro_sequence += table[seq[i:i+3]]
return pro_sequence
newthang = df.mtDNA_Sequence
for thang in newthang:
x = translate11(thang)
Protein_Sequence.append(x)
你的函数对我有用,我用一个短的核苷酸序列试了一下,它给出了合适的翻译
for 循环结束一个氨基酸短,所以你可以删除 3+ :
for i in range(0, len(seq)-(len(seq)%3), 3):
并且当您声明 pro_sequence 时,以空字符串 ""
而不是 space 字符开头 " "
所以在这些微小的变化之后,我尝试了以下方法:
sequence = "tactgtggctactcagctgtgcgcatggcccgcctgctgtcaccaggggcgaggctcatcaccatcgagatcaaccccgactgtgccgccatcacccagcggatggtggatttcgctggcatgaaggacaag"
print translate11(sequence.upper())
# YCGYSAVRMARLLSPGARLITIEINPDCAAITQRMVDFAGMKDK
这是正确的翻译
因此,您为函数提供的输入之一(来自 df.mtDNA_Sequence
)必须以字母“mtD”开头或包含字母“mtD”,而不仅仅是一串核苷酸
尝试添加另一个条件,如果字符不是可识别的密码子,则跳出 for 循环
for i in range(0, len(seq)-(len(seq)%3), 3):
if seq[i:i+3] not in table.keys() :
break
if table[seq[i:i+3]] == "_" :
break
我正在使用一个简单的函数将 DNA 序列转换为氨基酸序列。从高层次上看,代码看起来很不错,但每当我 运行 程序时,我都会收到错误 KeyError: 'mtD'
,此错误的来源显然在第 26 行(if table[seq[i:i+3]] == "_" :
) .唯一一次在我的程序中提到 'mtD' 是当我只是将我的数据集打印到控制台时,这使得问题更加令人费解。我的代码如下所示。
#Creating the protein sequence column for the data
Protein_Sequence = []
#dna to protein sequence function
def translate11(seq):
table = {"TTT" : "F", "CTT" : "L", "ATT" : "I", "GTT" : "V",
"TTC" : "F", "CTC" : "L", "ATC" : "I", "GTC" : "V",
"TTA" : "L", "CTA" : "L", "ATA" : "I", "GTA" : "V",
"TTG" : "L", "CTG" : "L", "ATG" : "M", "GTG" : "V",
"TCT" : "S", "CCT" : "P", "ACT" : "T", "GCT" : "A",
"TCC" : "S", "CCC" : "P", "ACC" : "T", "GCC" : "A",
"TCA" : "S", "CCA" : "P", "ACA" : "T", "GCA" : "A",
"TCG" : "S", "CCG" : "P", "ACG" : "T", "GCG" : "A",
"TAT" : "Y", "CAT" : "H", "AAT" : "N", "GAT" : "D",
"TAC" : "Y", "CAC" : "H", "AAC" : "N", "GAC" : "D",
"TAA" : "_", "CAA" : "Q", "AAA" : "K", "GAA" : "E",
"TAG" : "_", "CAG" : "Q", "AAG" : "K", "GAG" : "E",
"TGT" : "C", "CGT" : "R", "AGT" : "S", "GGT" : "G",
"TGC" : "C", "CGC" : "R", "AGC" : "S", "GGC" : "G",
"TGA" : "_", "CGA" : "R", "AGA" : "R", "GGA" : "G",
"TGG" : "W", "CGG" : "R", "AGG" : "R", "GGG" : "G"
}
pro_sequence =" "
for i in range(0, len(seq)-(3+len(seq)%3), 3):
if table[seq[i:i+3]] == "_" :
break
pro_sequence += table[seq[i:i+3]]
return pro_sequence
newthang = df.mtDNA_Sequence
for thang in newthang:
x = translate11(thang)
Protein_Sequence.append(x)
你的函数对我有用,我用一个短的核苷酸序列试了一下,它给出了合适的翻译
for 循环结束一个氨基酸短,所以你可以删除 3+ :
for i in range(0, len(seq)-(len(seq)%3), 3):
并且当您声明 pro_sequence 时,以空字符串 ""
而不是 space 字符开头 " "
所以在这些微小的变化之后,我尝试了以下方法:
sequence = "tactgtggctactcagctgtgcgcatggcccgcctgctgtcaccaggggcgaggctcatcaccatcgagatcaaccccgactgtgccgccatcacccagcggatggtggatttcgctggcatgaaggacaag"
print translate11(sequence.upper())
# YCGYSAVRMARLLSPGARLITIEINPDCAAITQRMVDFAGMKDK
这是正确的翻译
因此,您为函数提供的输入之一(来自 df.mtDNA_Sequence
)必须以字母“mtD”开头或包含字母“mtD”,而不仅仅是一串核苷酸
尝试添加另一个条件,如果字符不是可识别的密码子,则跳出 for 循环
for i in range(0, len(seq)-(len(seq)%3), 3):
if seq[i:i+3] not in table.keys() :
break
if table[seq[i:i+3]] == "_" :
break