Biopython:resseq 与 pdb 文件不匹配
Biopython: resseq doesn't match pdb file
我有一个 PDB 文件,我需要提取它的残基序列号 (resseq
s)。根据手动检查 PDB 文件的前几行(粘贴在下面),我认为 resseq
s 应该是 [22, 23, ...]
。但是,Biopython 的 Bio.PDB
模块另有建议(下面也附有输出)。我想知道这是否是 Biopython 错误,或者我是否在理解 PDB 格式时遇到问题。
ATOM 1 N GLY A 22 78.171 89.858 59.231 1.00 21.24 N
ATOM 2 CA GLY A 22 79.174 88.827 58.999 1.00 20.87 C
ATOM 3 C GLY A 22 80.438 89.415 58.391 1.00 21.89 C
ATOM 4 O GLY A 22 80.362 90.202 57.440 1.00 23.18 O
ATOM 5 N LEU A 23 81.588 89.069 58.972 1.00 21.51 N
ATOM 6 CA LEU A 23 82.895 89.555 58.527 1.00 20.80 C
ATOM 7 C LEU A 23 83.288 89.020 57.162 1.00 22.41 C
ATOM 8 O LEU A 23 82.889 87.923 56.788 1.00 22.93 O
ATOM 9 CB LEU A 23 83.973 89.232 59.560 1.00 20.97 C
ATOM 10 CG LEU A 23 84.225 87.818 60.062 1.00 13.32 C
ATOM 11 CD1 LEU A 23 85.448 87.888 60.939 1.00 15.24 C
ATOM 12 CD2 LEU A 23 83.035 87.258 60.829 1.00 12.21 C
我用来提取的代码resseq
:
...
for i in chain:
print i.get_full_id()
OUT:('pdb', 0, 'A', (' ', 2, ' '))
('pdb', 0, 'A', (' ', 3, ' '))
...
来自 Bio.PDB.Entity.get_full_id
的文档
def get_full_id(self):
"""Return the full id.
The full id is a tuple containing all id's starting from
the top object (Structure) down to the current object. A full id for
a Residue object e.g. is something like:
("1abc", 0, "A", (" ", 10, "A"))
This corresponds to:
Structure with id "1abc"
Model with id 0
Chain with id "A"
Residue with id (" ", 10, "A")
The Residue id indicates that the residue is not a hetero-residue
(or a water) because it has a blank hetero field, that its sequence
identifier is 10 and its insertion code "A".
"""
# The function implementation below here ...
我假设您正在遍历链中的原子而不是残基,这会为您提供每个 Atom
的完整 id
而不是 Residue
.
如果您将示例残基保存在名为 struct.pdb
的文件中并且 运行 下面的代码,您将得到正确的 id
s。
>>> structure = PDBParser().get_structure('test', 'struct.pdb')
>>> for residue in structure.get_residues():
... print(residue.get_full_id())
('test', 0, 'A', (' ', 22, ' '))
('test', 0, 'A', (' ', 23, ' '))
>>> resseqs = [residue.id[1] for residue in structure.get_residues()]
>>> print(resseqs)
[22, 23]
我有一个 PDB 文件,我需要提取它的残基序列号 (resseq
s)。根据手动检查 PDB 文件的前几行(粘贴在下面),我认为 resseq
s 应该是 [22, 23, ...]
。但是,Biopython 的 Bio.PDB
模块另有建议(下面也附有输出)。我想知道这是否是 Biopython 错误,或者我是否在理解 PDB 格式时遇到问题。
ATOM 1 N GLY A 22 78.171 89.858 59.231 1.00 21.24 N
ATOM 2 CA GLY A 22 79.174 88.827 58.999 1.00 20.87 C
ATOM 3 C GLY A 22 80.438 89.415 58.391 1.00 21.89 C
ATOM 4 O GLY A 22 80.362 90.202 57.440 1.00 23.18 O
ATOM 5 N LEU A 23 81.588 89.069 58.972 1.00 21.51 N
ATOM 6 CA LEU A 23 82.895 89.555 58.527 1.00 20.80 C
ATOM 7 C LEU A 23 83.288 89.020 57.162 1.00 22.41 C
ATOM 8 O LEU A 23 82.889 87.923 56.788 1.00 22.93 O
ATOM 9 CB LEU A 23 83.973 89.232 59.560 1.00 20.97 C
ATOM 10 CG LEU A 23 84.225 87.818 60.062 1.00 13.32 C
ATOM 11 CD1 LEU A 23 85.448 87.888 60.939 1.00 15.24 C
ATOM 12 CD2 LEU A 23 83.035 87.258 60.829 1.00 12.21 C
我用来提取的代码resseq
:
...
for i in chain:
print i.get_full_id()
OUT:('pdb', 0, 'A', (' ', 2, ' '))
('pdb', 0, 'A', (' ', 3, ' '))
...
来自 Bio.PDB.Entity.get_full_id
def get_full_id(self):
"""Return the full id.
The full id is a tuple containing all id's starting from
the top object (Structure) down to the current object. A full id for
a Residue object e.g. is something like:
("1abc", 0, "A", (" ", 10, "A"))
This corresponds to:
Structure with id "1abc"
Model with id 0
Chain with id "A"
Residue with id (" ", 10, "A")
The Residue id indicates that the residue is not a hetero-residue
(or a water) because it has a blank hetero field, that its sequence
identifier is 10 and its insertion code "A".
"""
# The function implementation below here ...
我假设您正在遍历链中的原子而不是残基,这会为您提供每个 Atom
的完整 id
而不是 Residue
.
如果您将示例残基保存在名为 struct.pdb
的文件中并且 运行 下面的代码,您将得到正确的 id
s。
>>> structure = PDBParser().get_structure('test', 'struct.pdb')
>>> for residue in structure.get_residues():
... print(residue.get_full_id())
('test', 0, 'A', (' ', 22, ' '))
('test', 0, 'A', (' ', 23, ' '))
>>> resseqs = [residue.id[1] for residue in structure.get_residues()]
>>> print(resseqs)
[22, 23]