开发一个生物信息学APP,通过条形码在前面识别DNA seq

Developing a bioinformatics APP that identifies DNA seq with a barcode at the front

我正在做一项介绍 python class 的作业,我在编写脚本来读取我的文件然后识别序列开头的条形码时遇到了很多麻烦在一个文件中。

这是我打开文件所必须的:

#!/usr/bin/python 

import sys  

fname  = sys.argv[1]

handle = open(fname , "r")
# read the file # 
for line in handle:
        print line.strip()

handle.close()

它可以完美地打开我的文件并将内容打印到屏幕上。

我遇到的问题是添加到此以完成作业我收到一条错误消息,我不确定我做错了什么。

如有任何帮助或建议,我将不胜感激。

作业及正确的预期结果及详细说明:

创建一个名为 ~/assignments/assignment07/assignment07.py

的可执行文件

python 脚本应采用 2 个命令行参数(按顺序):

(1)一个DNA条形码 (2)包含DNA序列的文件名

您的脚本应打印序列文件中与序列开头的给定条形码匹配的所有 DNA 序列,但丢弃条形码。不打印条形码,只打印与条形码匹配的序列,不匹配不在序列前面的条形码。

#!/usr/bin/python 
import sys
barcode  = sys.argv[1]
filename = sys.argv[2]

bclen = len(bacode)



handle = open(fname, "r")

# read the file # 

for line in handle:

        print line.strip()


for line in filename:

        bc    = line[4:][:bclen]

        seq   = line[4:19][bclen:]



        if bc == barcode:

                seqslice = sequence[4:]

                #print "barcode %s is at beginning of sequence %s" % (barcode, seqslice)



handle.close()

这个脚本充满了一些常见的开始错误(变量名不匹配和不理解切片的使用)但这里是一个更正后的版本,其中的注释应该有所帮助:

工作了运行python script_name.py 123barcode filename.csv

#!/usr/bin/python 
import sys
barcode  = sys.argv[1]
filename = sys.argv[2]

bclen = len(barcode) #fixed typo so from bacode
handle = open(filename, "r") #changed from fname

# read the file # 
## Combined for loops, no reason for double loop here

for line in handle:
    print line.strip()

    bc    = line[:bclen]  #changed to just slice of beginning to barcode length
    seq   = line[bclen:]  #from end of barcode to end (only want 19 just add)

    print "BC = " + bc    #Added these print statements: when problems occur
    print "SEQ = " + seq  # always look to see what variable actually contain

    #I don't know what you wanted here but this prints the matching sequence
    if bc == barcode:     
        print "barcode %s is at beginning of sequence %s" % (barcode, seq)
handle.close()