通过访问 Uniprot 获取蛋白质序列（使用 Python）

Question

我有一个蛋白质 ID 列表，我正在尝试使用 python 从 Uniprot 访问蛋白质序列。我遇到了这个 post :Protein sequence from uniprot protein id python 但给出了一个元素列表而不是实际序列：

代码

import requests as r
from Bio import SeqIO
from io import StringIO

cID='P04637'

baseUrl="http://www.uniprot.org/uniprot/"
currentUrl=baseUrl+cID+".fasta"
response = r.post(currentUrl)
cData=''.join(response.text)

Seq=StringIO(cData)
pSeq=list(SeqIO.parse(Seq,'fasta'))

给出输出：

输出

[SeqRecord(seq=Seq('MQAALIGLNFPLQRRFLSGVLTTTSSAKRCYSGDTGKPYDCTSAEHKKELEECY...SSS', SingleLetterAlphabet()), id='sp|O45228|PROD_CAEEL', name='sp|O45228|PROD_CAEEL', description='sp|O45228|PROD_CAEEL Proline dehydrogenase 1, mitochondrial OS=Caenorhabditis elegans OX=6239 GN=prdh-1 PE=2 SV=2', dbxrefs=[])]

我只是好奇如何才能真正获得序列本身。

Answer 1

[record.seq for record in pSeq]

编辑：你会想要 str(pSeq[0].seq)

通过访问 Uniprot 获取蛋白质序列（使用 Python）

Getting protein sequences by accessing Uniprot (with Python)

python

bioinformatics

biopython

代码

输出