使用 biopython SeqIO 从命令行发出处理文件

Issue handling file from command line with biopython SeqIO

这是我第一次尝试使用命令行参数而不是快速和肮脏的 sys.argv[] 并编写更多 'proper' python 脚本。由于某些我现在无法弄清楚的原因,它似乎反对我尝试从命令行使用输入文件的方式。

该脚本旨在获取一个输入文件、一些数字索引,然后切出文件的一个子集区域,但是我不断收到错误消息,指出我为传入的文件提供的变量未定义:

joehealey@7c-d1-c3-89-86-2c:~/Documents/Warwick/PhD/Scripts$ python slice_genbank.py --input PAU_06042014.gbk -o test.gbk -s 3907329 -e 3934427
Traceback (most recent call last):
  File "slice_genbank.py", line 70, in <module>
    sub_record = record[start:end]
NameError: name 'record' is not defined

这是代码,我哪里错了? (我相信它很简单):

#!/usr/bin/python

# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file.

# Based upon the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc44

# Set up and handle arguments:
from Bio import SeqIO
import getopt


def main(argv):
    record = ''
    start = ''
    end = ''
    try:
        opts, args = getopt.getopt(argv, 'hi:o:s:e:', [
                                                   'help',
                                                   'input=',
                                                   'outfile=',
                                                   'start=',
                                                   'end='
                                                   ]
                              )
        if not opts:
            print "No options supplied. Aborting."
            usage()
            sys.exit(2)
    except getopt.GetoptError:
        print "Some issue with commandline args.\n"
        usage()
        sys.exit(2)

    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit(2)
        elif opt in ("-i", "--input"):
            filename = arg
            record = SeqIO.read(arg, "genbank")
        elif opt in ("-o", "--outfile"):
            outfile = arg
        elif opt in ("-s", "--start"):
            start = arg
        elif opt in ("-e", "--end"):
            end = arg
    print("Slicing " + filename + " from " + str(start) + " to " + str(end))

def usage():
    print(
"""
This script 'slices' entries such as genes or operons out of a genbank,
subsetting them as their own file.

Usage:
python slice_genbank.py -h|--help -i|--input <genbank> -o|--output <genbank> -s|--start <int> -e|--end <int>"

Options:

-h|--help       Displays this usage message. No options will also do this.
-i|--input      The genbank file you which to subset a record from.
-o|--outfile    The file name you wish to give to the new sliced genbank.
-s|--start      An integer base index to slice the record from.
-e|--end        An integer base index to slice the record to.
"""
      )

#Do the slicing
sub_record = record[start:end]
SeqIO.write(sub_record, outfile, "genbank")

if __name__ == "__main__":
 main(sys.argv[1:])

也有可能 SeqIO.write 语法存在问题,但我还没有深入了解。

编辑:

还忘了说,当我使用 `record = SeqIO.read("file.gbk", "genbank") 并将文件名直接写入脚本时,它可以正常工作。

如评论中所述,您的变量records仅在方法main()中定义(startend也是如此),因此它对于程序的其余部分是不可见的。 您可以 return 这样的值:

def main(argv):
    ...
    ...
    return record, start, end

您对 main() 的调用可能如下所示:

record, start, end = main(sys.argv[1:])

或者,您可以将主要功能移动到 main 函数中(就像您所做的那样)。

(另一种方法是在主程序中定义变量并在函数中使用global关键字,但是不推荐这样做。)