SeqIO.parse python:功能期间过早结束行 table

SeqIO.parse python: Premature end of line during features table

有人遇到过这个问题吗?关于原因有什么建议吗?

该脚本创建包含基因组序列的文件,但它出现在该过程的末尾。

我脚本中的行

File "scripts/list_ncbi_download_genome_vs_02.py", line 97, in <module>
    SeqIO.write(SeqIO.parse(genbank_file, "genbank"), genome_file, "fasta")

出现的警告:

  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 481, in write
    count = writer_class(fp).write_file(sequences)
  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
    count = self.write_records(records)
  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 193, in write_records
    for record in records:
  File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 600, in parse
    for r in i:
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
    record = self.parse(handle, do_features)
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 462, in parse
    if self.feed(handle, consumer, do_features):
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 434, in feed
    self._feed_feature_table(consumer, self.parse_features(skip=False))
  File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 159, in parse_features
    raise ValueError("Premature end of line during features table")

我可以接受这个,但是完成一个过程并不是那么美好,然后它就出现了。

文件可在 https://github.com/felipelira/files_to_test/blob/master/GCF_000302915.1_Pav631_1.0_genomic.gbff

下载

我的脚本中调用命令的块是:

## rename and move files to the output directory created in the command line:
genome_dict = {}
genome_list = []
for genbank_file in list_uncompressed:
    organism = genbank_file.split('/')[0]
    file_name = genbank_file.split('/')[-1]
    genome_file = organism +'_'+ file_name.split('_')[0] +'_'+ file_name.split('_')[1]+'.fna'
    genome_list.append(genome_file)
    genome_dict[genome_file.replace('.fna', '')] = organism
#print genome_dict
    print "Dealing with GenBank record %s" % genome_file
    SeqIO.write(SeqIO.parse(genbank_file, "genbank"), os.path.join(outdir, genome_file), "fasta")
    print "Genome saved %s" % genome_file

问题已根据 post at biostars.org https://www.biostars.org/p/289314/#289407

中的建议解决

来自 Philipp Bayer 的建议: https://www.biostars.org/u/4678/

Normally this should work (and it does on my system). Are you writing to the genbank_file in the script before that? Perhaps you haven't closed the file handle yet so that writing to the file hasn't synced?

和a.zielezinski: https://www.biostars.org/u/4700/ 从生物导入 SeqIO

l = ['GCF_000302915.1_Pav631_1.0_genomic.gbff']
for genbank_file in l:
    fh = open(genbank_file)
    oh = open(genbank_file + '.fasta', 'w')
    for seq_record in SeqIO.parse(fh, 'genbank'):
        oh.write(seq_record.format('fasta'))
    oh.close()
    fh.close()