使用 biopython SeqIO 从命令行发出处理文件
Issue handling file from command line with biopython SeqIO
这是我第一次尝试使用命令行参数而不是快速和肮脏的 sys.argv[]
并编写更多 'proper' python 脚本。由于某些我现在无法弄清楚的原因,它似乎反对我尝试从命令行使用输入文件的方式。
该脚本旨在获取一个输入文件、一些数字索引,然后切出文件的一个子集区域,但是我不断收到错误消息,指出我为传入的文件提供的变量未定义:
joehealey@7c-d1-c3-89-86-2c:~/Documents/Warwick/PhD/Scripts$ python slice_genbank.py --input PAU_06042014.gbk -o test.gbk -s 3907329 -e 3934427
Traceback (most recent call last):
File "slice_genbank.py", line 70, in <module>
sub_record = record[start:end]
NameError: name 'record' is not defined
这是代码,我哪里错了? (我相信它很简单):
#!/usr/bin/python
# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file.
# Based upon the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc44
# Set up and handle arguments:
from Bio import SeqIO
import getopt
def main(argv):
record = ''
start = ''
end = ''
try:
opts, args = getopt.getopt(argv, 'hi:o:s:e:', [
'help',
'input=',
'outfile=',
'start=',
'end='
]
)
if not opts:
print "No options supplied. Aborting."
usage()
sys.exit(2)
except getopt.GetoptError:
print "Some issue with commandline args.\n"
usage()
sys.exit(2)
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit(2)
elif opt in ("-i", "--input"):
filename = arg
record = SeqIO.read(arg, "genbank")
elif opt in ("-o", "--outfile"):
outfile = arg
elif opt in ("-s", "--start"):
start = arg
elif opt in ("-e", "--end"):
end = arg
print("Slicing " + filename + " from " + str(start) + " to " + str(end))
def usage():
print(
"""
This script 'slices' entries such as genes or operons out of a genbank,
subsetting them as their own file.
Usage:
python slice_genbank.py -h|--help -i|--input <genbank> -o|--output <genbank> -s|--start <int> -e|--end <int>"
Options:
-h|--help Displays this usage message. No options will also do this.
-i|--input The genbank file you which to subset a record from.
-o|--outfile The file name you wish to give to the new sliced genbank.
-s|--start An integer base index to slice the record from.
-e|--end An integer base index to slice the record to.
"""
)
#Do the slicing
sub_record = record[start:end]
SeqIO.write(sub_record, outfile, "genbank")
if __name__ == "__main__":
main(sys.argv[1:])
也有可能 SeqIO.write 语法存在问题,但我还没有深入了解。
编辑:
还忘了说,当我使用 `record = SeqIO.read("file.gbk", "genbank") 并将文件名直接写入脚本时,它可以正常工作。
如评论中所述,您的变量records
仅在方法main()
中定义(start
和end
也是如此),因此它对于程序的其余部分是不可见的。
您可以 return 这样的值:
def main(argv):
...
...
return record, start, end
您对 main()
的调用可能如下所示:
record, start, end = main(sys.argv[1:])
或者,您可以将主要功能移动到 main
函数中(就像您所做的那样)。
(另一种方法是在主程序中定义变量并在函数中使用global
关键字,但是不推荐这样做。)
这是我第一次尝试使用命令行参数而不是快速和肮脏的 sys.argv[]
并编写更多 'proper' python 脚本。由于某些我现在无法弄清楚的原因,它似乎反对我尝试从命令行使用输入文件的方式。
该脚本旨在获取一个输入文件、一些数字索引,然后切出文件的一个子集区域,但是我不断收到错误消息,指出我为传入的文件提供的变量未定义:
joehealey@7c-d1-c3-89-86-2c:~/Documents/Warwick/PhD/Scripts$ python slice_genbank.py --input PAU_06042014.gbk -o test.gbk -s 3907329 -e 3934427
Traceback (most recent call last):
File "slice_genbank.py", line 70, in <module>
sub_record = record[start:end]
NameError: name 'record' is not defined
这是代码,我哪里错了? (我相信它很简单):
#!/usr/bin/python
# This script is designed to take a genbank file and 'slice out'/'subset'
# regions (genes/operons etc.) and produce a separate file.
# Based upon the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc44
# Set up and handle arguments:
from Bio import SeqIO
import getopt
def main(argv):
record = ''
start = ''
end = ''
try:
opts, args = getopt.getopt(argv, 'hi:o:s:e:', [
'help',
'input=',
'outfile=',
'start=',
'end='
]
)
if not opts:
print "No options supplied. Aborting."
usage()
sys.exit(2)
except getopt.GetoptError:
print "Some issue with commandline args.\n"
usage()
sys.exit(2)
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit(2)
elif opt in ("-i", "--input"):
filename = arg
record = SeqIO.read(arg, "genbank")
elif opt in ("-o", "--outfile"):
outfile = arg
elif opt in ("-s", "--start"):
start = arg
elif opt in ("-e", "--end"):
end = arg
print("Slicing " + filename + " from " + str(start) + " to " + str(end))
def usage():
print(
"""
This script 'slices' entries such as genes or operons out of a genbank,
subsetting them as their own file.
Usage:
python slice_genbank.py -h|--help -i|--input <genbank> -o|--output <genbank> -s|--start <int> -e|--end <int>"
Options:
-h|--help Displays this usage message. No options will also do this.
-i|--input The genbank file you which to subset a record from.
-o|--outfile The file name you wish to give to the new sliced genbank.
-s|--start An integer base index to slice the record from.
-e|--end An integer base index to slice the record to.
"""
)
#Do the slicing
sub_record = record[start:end]
SeqIO.write(sub_record, outfile, "genbank")
if __name__ == "__main__":
main(sys.argv[1:])
也有可能 SeqIO.write 语法存在问题,但我还没有深入了解。
编辑:
还忘了说,当我使用 `record = SeqIO.read("file.gbk", "genbank") 并将文件名直接写入脚本时,它可以正常工作。
如评论中所述,您的变量records
仅在方法main()
中定义(start
和end
也是如此),因此它对于程序的其余部分是不可见的。
您可以 return 这样的值:
def main(argv):
...
...
return record, start, end
您对 main()
的调用可能如下所示:
record, start, end = main(sys.argv[1:])
或者,您可以将主要功能移动到 main
函数中(就像您所做的那样)。
(另一种方法是在主程序中定义变量并在函数中使用global
关键字,但是不推荐这样做。)