Biopython：是否有单行代码可以从 PDB 文件中提取特定链的氨基酸序列？

Question

我想从一堆PDB文件中提取特定链的单字母氨基酸序列。

我可以使用 SeqIO.parse() 来做到这一点，但在我看来它感觉很不符合 pythonic:

PDB_file_path = '/full/path/to/some/pdb' 

# Is there a 1-liner for this ?
query_seqres = SeqIO.parse(PDB_file_path, 'pdb-seqres')

for chain in query_seqres:
    if chain.id == query_chain_id:
        query_chain = chain.seq
#

有没有更简洁明了的方法？

Answer 1

在我看来，它并没有更多的 Pythonic，但您可以使用字典推导将生成器变成显式的 dict:

from Bio import SeqIO
PDB_file_path = '6q62.pdb' 
query_chain_id = '6Q62:A'

chain = {record.id: record.seq for record in SeqIO.parse(PDB_file_path, 'pdb-seqres')}
query_chain = chain[query_chain_id]

Answer 2

扩展@BioGeek 的答案，这里是使用 PDBParser.get_structure() 而不是 SeqIO.parse()

时提取序列的等效代码

from Bio.PDB import PDBParser
from Bio.SeqUtils import seq1

pdbparser = PDBParser()

structure = pdbparser.get_structure(PDB_ID, PDB_file_path)
chains = {chain.id:seq1(''.join(residue.resname for residue in chain)) for chain in structure.get_chains()}

query_chain = chains[query_chain_id]

Biopython：是否有单行代码可以从 PDB 文件中提取特定链的氨基酸序列？

Biopython: is there a one-liner to extract the amino acid sequence of a specific chain from a PDB file?

python

bioinformatics

biopython