如何在 python 的 pdb 文件中重新编号残基(从 1 开始在链中继续)?

How to renumber residues (start from 1 in continuation among chains) in pdb file in python?

我想对具有多个链(A、H、L)的 pdb 文件进行连续重新编号。一些链的残基位置附有插入代码(例如 190A 等)。谁能帮我写这段代码? Example of pdb file with insertion

我使用 Biopython 的尝试:

输入文件:testA.pdb

ATOM     25  N   ALA E   5      48.087  97.950  74.514  1.00  9.33           N  
ATOM     26  CA  ALA E   5      48.052  99.292  73.904  1.00  9.37           C  
ATOM     27  C   ALA E   5      47.483 100.285  74.935  1.00  9.65           C  
ATOM     28  O   ALA E   5      47.693 101.493  74.908  1.00  9.11           O  
ATOM     29  CB  ALA E   5      47.247  99.339  72.623  1.00  8.31           C  
ATOM     30  N   ILE E   6      46.802  99.657  75.862  1.00  9.99           N  
ATOM     31  CA  ILE E   6      46.118 100.279  77.004  1.00 10.34           C  
ATOM     32  C   ILE E   6      46.521  99.491  78.253  1.00 10.35           C  
ATOM     33  O   ILE E   6      46.292  98.274  78.348  1.00  9.61           O  
ATOM     34  CB  ILE E   6      44.613 100.230  76.772  1.00 11.05           C  
ATOM     35  CG1 ILE E   6      44.269 100.841  75.413  1.00 11.39           C  
ATOM     36  CG2 ILE E   6      43.845 100.913  77.879  1.00 11.06           C  
ATOM     37  CD1 ILE E   6      42.926 100.408  74.814  1.00 11.29           C
ATOM     30  N   ILE E   6A     46.802  99.657  75.862  1.00  9.99           N  
ATOM     31  CA  ILE E   6A     46.118 100.279  77.004  1.00 10.34           C  
ATOM     32  C   ILE E   6A     46.521  99.491  78.253  1.00 10.35           C  
ATOM     33  O   ILE E   6A     46.292  98.274  78.348  1.00  9.61           O  
ATOM     34  CB  ILE E   6A     44.613 100.230  76.772  1.00 11.05           C  
ATOM     35  CG1 ILE E   6A     44.269 100.841  75.413  1.00 11.39           C  
ATOM     36  CG2 ILE E   6A     43.845 100.913  77.879  1.00 11.06           C  
ATOM     37  CD1 ILE E   6A     42.926 100.408  74.814  1.00 11.29           C  
ATOM     38  N   GLN E   7      47.184 100.177  79.159  1.00 10.08           N  
ATOM     39  CA  GLN E   7      47.750  99.648  80.383  1.00 10.85           C  
ATOM     40  C   GLN E   7      46.749  99.311  81.476  1.00 10.94           C  
ATOM     41  O   GLN E   7      45.812 100.068  81.762  1.00 10.33           O  
ATOM     42  CB  GLN E   7      48.855 100.550  80.962  1.00 11.19           C  
ATOM     43  CG  GLN E   7      50.227 100.292  80.353  1.00 11.71           C  
ATOM     44  CD  GLN E   7      50.656 101.322  79.346  1.00 12.04           C  
ATOM     45  OE1 GLN E   7      50.015 101.625  78.348  1.00 11.94           O  
ATOM     46  NE2 GLN E   7      51.811 101.943  79.591  1.00 12.40           N  
ATOM     47  N   PRO E   8      46.990  98.145  82.066  1.00 11.13           N  
ATOM     48  CA  PRO E   8      46.204  97.689  83.212  1.00 11.66           C  
ATOM     49  C   PRO E   8      46.688  98.594  84.352  1.00 11.77           C  
ATOM     50  O   PRO E   8      47.885  98.899  84.409  1.00 11.72           O  
ATOM     51  CB  PRO E   8      46.586  96.236  83.432  1.00 11.66           C  
ATOM     52  CG  PRO E   8      47.935  96.031  82.787  1.00 11.65           C  
ATOM     53  CD  PRO E   8      48.114  97.207  81.829  1.00 11.20           C  

我的代码:

from Bio.PDB import PDBIO, PDBParser

from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue


# to work with some non orthodox pdbs
import warnings
warnings.filterwarnings('ignore')


io = PDBIO()
parser = PDBParser()


# my_pdb_structure = parser.get_structure('test', 'test.pdb')
my_pdb_structure = parser.get_structure('test', 'testA.pdb')

print(my_pdb_structure)


# renumber residue in my_pdb_structure
residue_N = 1
for model in my_pdb_structure:
    for chain in model:
            for residue in chain:
                print(residue.id)
                if 'A' in residue.id[2]:
                    residue.id = (residue.id[0], residue_N-1, residue.id[2])
                    print('----',residue.id)
                    
                else:
                    residue.id = (residue.id[0], residue_N, residue.id[2])
                    print('----',residue.id)
                    residue_N += 1


# this bit just print the renumbered my_pdb_structure                    
print('\n stucture with renumbered atoms : \n___________________________________')                  
for model in my_pdb_structure:
    for chain in model:
            for residue in chain:
                print(residue, residue.id)

        
io.set_structure(my_pdb_structure)
# io.save('renumbered.pdb') 
io.save('renumberedA.pdb',  preserve_atom_numbering=True) 

输出renumberedA.pdb:

ATOM     25  N   ALA E   1      48.087  97.950  74.514  1.00  9.33           N  
ATOM     26  CA  ALA E   1      48.052  99.292  73.904  1.00  9.37           C  
ATOM     27  C   ALA E   1      47.483 100.285  74.935  1.00  9.65           C  
ATOM     28  O   ALA E   1      47.693 101.493  74.908  1.00  9.11           O  
ATOM     29  CB  ALA E   1      47.247  99.339  72.623  1.00  8.31           C  
ATOM     30  N   ILE E   2      46.802  99.657  75.862  1.00  9.99           N  
ATOM     31  CA  ILE E   2      46.118 100.279  77.004  1.00 10.34           C  
ATOM     32  C   ILE E   2      46.521  99.491  78.253  1.00 10.35           C  
ATOM     33  O   ILE E   2      46.292  98.274  78.348  1.00  9.61           O  
ATOM     34  CB  ILE E   2      44.613 100.230  76.772  1.00 11.05           C  
ATOM     35  CG1 ILE E   2      44.269 100.841  75.413  1.00 11.39           C  
ATOM     36  CG2 ILE E   2      43.845 100.913  77.879  1.00 11.06           C  
ATOM     37  CD1 ILE E   2      42.926 100.408  74.814  1.00 11.29           C  
ATOM     30  N   ILE E   2A     46.802  99.657  75.862  1.00  9.99           N  
ATOM     31  CA  ILE E   2A     46.118 100.279  77.004  1.00 10.34           C  
ATOM     32  C   ILE E   2A     46.521  99.491  78.253  1.00 10.35           C  
ATOM     33  O   ILE E   2A     46.292  98.274  78.348  1.00  9.61           O  
ATOM     34  CB  ILE E   2A     44.613 100.230  76.772  1.00 11.05           C  
ATOM     35  CG1 ILE E   2A     44.269 100.841  75.413  1.00 11.39           C  
ATOM     36  CG2 ILE E   2A     43.845 100.913  77.879  1.00 11.06           C  
ATOM     37  CD1 ILE E   2A     42.926 100.408  74.814  1.00 11.29           C  
ATOM     38  N   GLN E   3      47.184 100.177  79.159  1.00 10.08           N  
ATOM     39  CA  GLN E   3      47.750  99.648  80.383  1.00 10.85           C  
ATOM     40  C   GLN E   3      46.749  99.311  81.476  1.00 10.94           C  
ATOM     41  O   GLN E   3      45.812 100.068  81.762  1.00 10.33           O  
ATOM     42  CB  GLN E   3      48.855 100.550  80.962  1.00 11.19           C  
ATOM     43  CG  GLN E   3      50.227 100.292  80.353  1.00 11.71           C  
ATOM     44  CD  GLN E   3      50.656 101.322  79.346  1.00 12.04           C  
ATOM     45  OE1 GLN E   3      50.015 101.625  78.348  1.00 11.94           O  
ATOM     46  NE2 GLN E   3      51.811 101.943  79.591  1.00 12.40           N  
ATOM     47  N   PRO E   4      46.990  98.145  82.066  1.00 11.13           N  
ATOM     48  CA  PRO E   4      46.204  97.689  83.212  1.00 11.66           C  
ATOM     49  C   PRO E   4      46.688  98.594  84.352  1.00 11.77           C  
ATOM     50  O   PRO E   4      47.885  98.899  84.409  1.00 11.72           O  
ATOM     51  CB  PRO E   4      46.586  96.236  83.432  1.00 11.66           C  
ATOM     52  CG  PRO E   4      47.935  96.031  82.787  1.00 11.65           C  
ATOM     53  CD  PRO E   4      48.114  97.207  81.829  1.00 11.20           C  
TER      53      PRO E   4                                                       
END   



代码只是通过 PDBParser() 加载 pdb 文件并遍历 pdb 结构对象,从 1 开始更改残基的 id,并在每个循环中添加 +1,然后通过 PDBIO() 保存 renubered_structure (先设置结构再保存)

我不知道 Biopython 的 PDB 解析和 PDB 结构对象的内部结构,我的代码只适用于你的测试输入,即有 A 的残基总是在没有 A 的相同残基之后,你可以 运行 使用不同的输入 pdb 进行测试以检查出来

根据您的评论和我上面的输入,您可以获得以下输出:

ATOM     25  N   ALA E   1      48.087  97.950  74.514  1.00  9.33           N  
ATOM     26  CA  ALA E   1      48.052  99.292  73.904  1.00  9.37           C  
ATOM     27  C   ALA E   1      47.483 100.285  74.935  1.00  9.65           C  
ATOM     28  O   ALA E   1      47.693 101.493  74.908  1.00  9.11           O  
ATOM     29  CB  ALA E   1      47.247  99.339  72.623  1.00  8.31           C  
ATOM     30  N   ILE E   2      46.802  99.657  75.862  1.00  9.99           N  
ATOM     31  CA  ILE E   2      46.118 100.279  77.004  1.00 10.34           C  
ATOM     32  C   ILE E   2      46.521  99.491  78.253  1.00 10.35           C  
ATOM     33  O   ILE E   2      46.292  98.274  78.348  1.00  9.61           O  
ATOM     34  CB  ILE E   2      44.613 100.230  76.772  1.00 11.05           C  
ATOM     35  CG1 ILE E   2      44.269 100.841  75.413  1.00 11.39           C  
ATOM     36  CG2 ILE E   2      43.845 100.913  77.879  1.00 11.06           C  
ATOM     37  CD1 ILE E   2      42.926 100.408  74.814  1.00 11.29           C  
ATOM     30  N   ILE E   3      46.802  99.657  75.862  1.00  9.99           N  
ATOM     31  CA  ILE E   3      46.118 100.279  77.004  1.00 10.34           C  
ATOM     32  C   ILE E   3      46.521  99.491  78.253  1.00 10.35           C  
ATOM     33  O   ILE E   3      46.292  98.274  78.348  1.00  9.61           O  
ATOM     34  CB  ILE E   3      44.613 100.230  76.772  1.00 11.05           C  
ATOM     35  CG1 ILE E   3      44.269 100.841  75.413  1.00 11.39           C  
ATOM     36  CG2 ILE E   3      43.845 100.913  77.879  1.00 11.06           C  
ATOM     37  CD1 ILE E   3      42.926 100.408  74.814  1.00 11.29           C  
ATOM     38  N   GLN E   4      47.184 100.177  79.159  1.00 10.08           N  
ATOM     39  CA  GLN E   4      47.750  99.648  80.383  1.00 10.85           C  
ATOM     40  C   GLN E   4      46.749  99.311  81.476  1.00 10.94           C  
ATOM     41  O   GLN E   4      45.812 100.068  81.762  1.00 10.33           O  
ATOM     42  CB  GLN E   4      48.855 100.550  80.962  1.00 11.19           C  
ATOM     43  CG  GLN E   4      50.227 100.292  80.353  1.00 11.71           C  
ATOM     44  CD  GLN E   4      50.656 101.322  79.346  1.00 12.04           C  
ATOM     45  OE1 GLN E   4      50.015 101.625  78.348  1.00 11.94           O  
ATOM     46  NE2 GLN E   4      51.811 101.943  79.591  1.00 12.40           N  
ATOM     47  N   PRO E   5      46.990  98.145  82.066  1.00 11.13           N  
ATOM     48  CA  PRO E   5      46.204  97.689  83.212  1.00 11.66           C  
ATOM     49  C   PRO E   5      46.688  98.594  84.352  1.00 11.77           C  
ATOM     50  O   PRO E   5      47.885  98.899  84.409  1.00 11.72           O  
ATOM     51  CB  PRO E   5      46.586  96.236  83.432  1.00 11.66           C  
ATOM     52  CG  PRO E   5      47.935  96.031  82.787  1.00 11.65           C  
ATOM     53  CD  PRO E   5      48.114  97.207  81.829  1.00 11.20           C  
TER      53      PRO E   5                                                       
END   

只是更改重新编号的代码块,下一个:

# renumber residue in my_pdb_structure
residue_N = 1
for model in my_pdb_structure:
    for chain in model:
            for residue in chain:
                print(residue.id)
                residue.id = (residue.id[0], residue_N, " ")
                print('----',residue.id)
                residue_N += 1

这将对所有从 1 开始计数的残基重新编号并删除 pdb 中的所有 As 或其他字母