Biopython:resseq 与 pdb 文件不匹配

Biopython: resseq doesn't match pdb file

我有一个 PDB 文件,我需要提取它的残基序列号 (resseqs)。根据手动检查 PDB 文件的前几行(粘贴在下面),我认为 resseqs 应该是 [22, 23, ...]。但是,Biopython 的 Bio.PDB 模块另有建议(下面也附有输出)。我想知道这是否是 Biopython 错误,或者我是否在理解 PDB 格式时遇到问题。

ATOM      1  N   GLY A  22      78.171  89.858  59.231  1.00 21.24           N  
ATOM      2  CA  GLY A  22      79.174  88.827  58.999  1.00 20.87           C  
ATOM      3  C   GLY A  22      80.438  89.415  58.391  1.00 21.89           C  
ATOM      4  O   GLY A  22      80.362  90.202  57.440  1.00 23.18           O  
ATOM      5  N   LEU A  23      81.588  89.069  58.972  1.00 21.51           N  
ATOM      6  CA  LEU A  23      82.895  89.555  58.527  1.00 20.80           C  
ATOM      7  C   LEU A  23      83.288  89.020  57.162  1.00 22.41           C  
ATOM      8  O   LEU A  23      82.889  87.923  56.788  1.00 22.93           O  
ATOM      9  CB  LEU A  23      83.973  89.232  59.560  1.00 20.97           C  
ATOM     10  CG  LEU A  23      84.225  87.818  60.062  1.00 13.32           C  
ATOM     11  CD1 LEU A  23      85.448  87.888  60.939  1.00 15.24           C  
ATOM     12  CD2 LEU A  23      83.035  87.258  60.829  1.00 12.21           C

我用来提取的代码resseq:

...
for i in chain:
    print i.get_full_id()

OUT:('pdb', 0, 'A', (' ', 2, ' '))
    ('pdb', 0, 'A', (' ', 3, ' '))
...

来自 Bio.PDB.Entity.get_full_id

的文档
def get_full_id(self):
    """Return the full id.

    The full id is a tuple containing all id's starting from
    the top object (Structure) down to the current object. A full id for
    a Residue object e.g. is something like:

    ("1abc", 0, "A", (" ", 10, "A"))

    This corresponds to:

    Structure with id "1abc"
    Model with id 0
    Chain with id "A"
    Residue with id (" ", 10, "A")

    The Residue id indicates that the residue is not a hetero-residue
    (or a water) because it has a blank hetero field, that its sequence
    identifier is 10 and its insertion code "A".
    """
    # The function implementation below here ...

我假设您正在遍历链中的原子而不是残基,这会为您提供每个 Atom 的完整 id 而不是 Residue.

如果您将示例残基保存在名为 struct.pdb 的文件中并且 运行 下面的代码,您将得到正确的 ids。

>>> structure = PDBParser().get_structure('test', 'struct.pdb')
>>> for residue in structure.get_residues():
...    print(residue.get_full_id())
('test', 0, 'A', (' ', 22, ' '))
('test', 0, 'A', (' ', 23, ' '))
>>> resseqs = [residue.id[1] for residue in structure.get_residues()]
>>> print(resseqs)
[22, 23]