使用捕获组拆分字符串

Question

我有两个字符串

/some/path/to/sequence2.1001.tif

和

/some/path/to/sequence_another_u1_v2.tif

我想编写一个函数，以便可以通过一些正则表达式将两个字符串拆分成一个列表并重新连接在一起，而不会丢失任何字符。

所以

def split_by_group(path, re_compile): 
    # ...
    return ['the', 'parts', 'here']

split_by_group('/some/path/to/sequence2.1001.tif', re.compile(r'(\.(\d+)\.')
# Result: ['/some/path/to/sequence2.', '1001', '.tif']

split_by_group('/some/path/to/sequence_another_u1_v2.tif', re.compile(r'_[uv](\d+)')
# Result: ['/some/path/to/sequence_another_u', '1', '_v', '2', '.tif']

正则表达式与我上面写的完全一样并不重要（但理想情况下，我希望接受的答案同时使用两者）。我唯一的标准是拆分字符串必须可以组合而不丢失任何数字，并且每个组都按照我上面显示的方式拆分（拆分发生在捕获组的 start/end 而不是完整的字符串.

我用 finditer 做了一些东西，但它太笨拙了，我正在寻找一种更简洁的方法。谁能帮帮我？

Answer 1

如果您不介意的话，稍微更改一下您的正则表达式。不确定这是否适用于您的其他案例。

def split_by_group(path, re_compile):
    l = [s for s in re_compile.split(path) if s]
    l[0:2] = [''.join(l[0:2])]
    return l

split_by_group('/some/path/to/sequence2.1001.tif', re.compile('(\.)(\d+)'))
# Result: ['/some/path/to/sequence2.', '1001', '.tif']

split_by_group('/some/path/to/sequence_another_u1_v2.tif', re.compile('(_[uv])(\d+)'))
# Result: ['/some/path/to/sequence_another_u', '1', '_v', '2', '.tif']

使用捕获组拆分字符串

Split string using capture groups

python

string