KeyError: 'mtD' when 'mtD' is nowhere to be found in the relevant code

KeyError: 'mtD' when 'mtD' is nowhere to be found in the relevant code

我正在使用一个简单的函数将 DNA 序列转换为氨基酸序列。从高层次上看,代码看起来很不错,但每当我 运行 程序时,我都会收到错误 KeyError: 'mtD',此错误的来源显然在第 26 行(if table[seq[i:i+3]] == "_" :) .唯一一次在我的程序中提到 'mtD' 是当我只是将我的数据集打印到控制台时,这使得问题更加令人费解。我的代码如下所示。

#Creating the protein sequence column for the data
Protein_Sequence = []

#dna to protein sequence function
def translate11(seq): 
  table = {"TTT" : "F", "CTT" : "L", "ATT" : "I", "GTT" : "V",
           "TTC" : "F", "CTC" : "L", "ATC" : "I", "GTC" : "V",
           "TTA" : "L", "CTA" : "L", "ATA" : "I", "GTA" : "V",
           "TTG" : "L", "CTG" : "L", "ATG" : "M", "GTG" : "V",
           "TCT" : "S", "CCT" : "P", "ACT" : "T", "GCT" : "A",
           "TCC" : "S", "CCC" : "P", "ACC" : "T", "GCC" : "A",
           "TCA" : "S", "CCA" : "P", "ACA" : "T", "GCA" : "A",
           "TCG" : "S", "CCG" : "P", "ACG" : "T", "GCG" : "A",
           "TAT" : "Y", "CAT" : "H", "AAT" : "N", "GAT" : "D",
           "TAC" : "Y", "CAC" : "H", "AAC" : "N", "GAC" : "D",
           "TAA" : "_", "CAA" : "Q", "AAA" : "K", "GAA" : "E",
           "TAG" : "_", "CAG" : "Q", "AAG" : "K", "GAG" : "E",
           "TGT" : "C", "CGT" : "R", "AGT" : "S", "GGT" : "G",
           "TGC" : "C", "CGC" : "R", "AGC" : "S", "GGC" : "G",
           "TGA" : "_", "CGA" : "R", "AGA" : "R", "GGA" : "G",
           "TGG" : "W", "CGG" : "R", "AGG" : "R", "GGG" : "G" 
           }
  pro_sequence =" "

  for i in range(0, len(seq)-(3+len(seq)%3), 3):
    if table[seq[i:i+3]] == "_" :
        break
    pro_sequence += table[seq[i:i+3]]

     
  return pro_sequence

newthang = df.mtDNA_Sequence
for thang in newthang:
  x = translate11(thang)
  Protein_Sequence.append(x)

你的函数对我有用,我用一个短的核苷酸序列试了一下,它给出了合适的翻译

for 循环结束一个氨基酸短,所以你可以删除 3+ :

for i in range(0, len(seq)-(len(seq)%3), 3):

并且当您声明 pro_sequence 时,以空字符串 "" 而不是 space 字符开头 " "

所以在这些微小的变化之后,我尝试了以下方法:

sequence = "tactgtggctactcagctgtgcgcatggcccgcctgctgtcaccaggggcgaggctcatcaccatcgagatcaaccccgactgtgccgccatcacccagcggatggtggatttcgctggcatgaaggacaag"
print translate11(sequence.upper())

# YCGYSAVRMARLLSPGARLITIEINPDCAAITQRMVDFAGMKDK

这是正确的翻译

因此,您为函数提供的输入之一(来自 df.mtDNA_Sequence)必须以字母“mtD”开头或包含字母“mtD”,而不仅仅是一串核苷酸

尝试添加另一个条件,如果字符不是可识别的密码子,则跳出 for 循环

for i in range(0, len(seq)-(len(seq)%3), 3):
  if seq[i:i+3] not in table.keys() :
    break
  if table[seq[i:i+3]] == "_" :
    break