如何从python的多个登录号中return对应ncbi的fasta蛋白序列？

Question

我在使用 python 脚本下载文本文件中多个登录号的 fasta 序列时遇到了一些困难。我可以为单个登录号执行此操作，例如：

import sys
from Bio import Entrez
Entrez.email = "X@Y.com"
handle = Entrez.efetch(db="protein", id="EAS03220", rettype="fasta")
print(handle.read())

但是当我尝试给它一个文件作为列表时（见下文）然后我得到错误。

import sys
from Bio import Entrez
Entrez.email = "X@Y.com"    

accessions = []
for line in open(sys.argv[1],"r"):
    line = line.strip()
    accessions.append(line)

for num in accessions:
    handle = Entrez.efetch(db="protein", id="num", rettype="fasta")
    print(handle.read())

这是我的输入文件的外观示例：

EAS06781
EAS07087
EAS07113
EAS07200
EAS07226
EAS07230

我确定解决方案很简单，但我已经阅读了论坛、ncbi 帮助页面和 python 初学者书籍数小时，但一无所获！提前致谢。

Answer 1

您将 num 作为 string 而不是变量传递。尝试删除引号，应该可以。

handle = Entrez.efetch(db="protein", id=num, rettype="fasta")

如何从python的多个登录号中return对应ncbi的fasta蛋白序列？

How can I return corresponding fasta protein sequences from ncbi from multiple accession numbers in python?

python

biopython

fasta

ncbi