仅当PDB坐标之间没有space时,如何在PDB坐标之间添加space?

How to add space between PDB coordinates only when there is no space in between?

当任何两个 XYZ 坐标之间没有 space 时,我使用的软件不接受输入。例如“38.420 -6.206-108.383”只有写成“38.420 -6.206 -108.383”才能处理。 由于我有 1000 多个 PDB 文件要处理,每个文件都包含数百行坐标,我迫切需要一种有效的方法来检测是否需要在任意两个坐标之间添加 space,如果所以,插入一个 space。 插入space后,需要将“.inp”文件中的旧坐标替换为新坐标作为输入。

我怎么可能用python实现这个?

我尝试手动更改坐标,但后来意识到这几乎是不可能的...下面的代码确实有效,但它只粘贴原始坐标,而不在必要时插入 space。

当前代码如下:

'''

input = open("hole.inp").readlines()
old_cpoint = input[16][7:29]

for i in range(1002):
    with open(dir + str(i) + '.pdb') as file:
        c = file.readlines()
        new_cpoint = c[2849][31:54]

        f_a = open("hole_"+str(i)+".inp").read()
        f_a = f_a.replace(str(old_cpoint),str(new_cpoint))
        f_b = open("hole_"+str(i)+".inp", 'w')
        f_b.write(f_a)
        f_b.close()

'''

解决方案

请查看此解决方案 - 希望它足以成功解析您的所有数据。您应该使用 parse_coordinates 函数。我试图在评论中明确说明,但如果有不清楚的地方,请告诉我!

# Declare the string separator to insert it between the coordinates
SEPARATOR = " "

# Declare the offset - there are always 3 digits after the dots - it will be used to distinguish each coordinate
OFFSET = 3


def parse_coordinates(data):

    # You can remove this line if you don't need to save a copy of your data
    data = data.copy()

    # Iterate over the coordinates - we will need the iterator ("i") to access and modify the elements at each position
    for i, coordinates in enumerate(data):

        # Since there are always 3 coordinates we can extract the information about first 2 dots and add the offset
        sep_index_0 = coordinates.find(".") + OFFSET + 1
        sep_index_1 = coordinates.find(".", sep_index_0 + 1) + OFFSET + 1

        # Insert the separator if needed - need to increase sep_index_1 because the string is now longer
        if coordinates[sep_index_0] != SEPARATOR:
            coordinates = str.join(SEPARATOR, (coordinates[:sep_index_0], coordinates[sep_index_0:]))
            sep_index_1 += 1

        # Similarly for the second place where there should be a separator, except no need to increase the index anymore
        if coordinates[sep_index_1] != SEPARATOR:
            coordinates = str.join(SEPARATOR, (coordinates[:sep_index_1], coordinates[sep_index_1:]))

        # You haven't explicitly mentioned it, but this ensures there are single spaces
        coordinates = coordinates.replace(SEPARATOR * 2, SEPARATOR)

        # Modify the existing value to match the expected pattern
        data[i] = coordinates

    return data


# Read your input - can't really get simpler!
with open("input.in") as f:
    to_parse = f.readlines()

# Write to your output - this also makes sure there are exactly 2 spaces in the parsed coordinates
with open("output.out", "w") as f:
    for parsed_coordinates in parse_coordinates(to_parse):
        assert parsed_coordinates.count(SEPARATOR) == 2
        f.write(parsed_coordinates)

输入如下(应该够全面了,还是自己测试吧!):

38.420 -6.206-108.383
38.420-6.206-108.383
38.420-6.206 -108.383
38.420 -6.206 -108.383
38.420 6.206 108.383
38.4206.206 108.383
38.420 6.206108.383
38.4206.206108.383
38.420  6.206  108.383

输出结果如下:

38.420 -6.206 -108.383
38.420 -6.206 -108.383
38.420 -6.206 -108.383
38.420 -6.206 -108.383
38.420 6.206 108.383
38.420 6.206 108.383
38.420 6.206 108.383
38.420 6.206 108.383
38.420 6.206 108.383

奖金

一些额外的想法:

  1. 避免使用 input 作为变量名 - 它已经是 Python
  2. 中的内置函数
  3. 您打开的文件有点多 - 避免它,因为它会大大减慢计算速度
  4. 您可能想看看 os.path.join 来构建文件的路径 - 它使用起来非常简单而且非常有用