查找字符串中的连续字符 + 它们的开始和结束索引 (python)

Question

我将处理长度约为 365 个字符的字符串。在这些字符串中，我想找到字符“-”的连续运行以及每个连续运行的开始和结束索引。这应该包括字符只出现一次的情况。

考虑以下字符串：'a---b-cccc-----'。我想知道连续运行三个 '-' 字符，然后出现一次，然后连续出现另外五个 '-' 字符。我还想知道他们的开始和结束位置。在元组列表（开始、结束、连续数）中报告结果会很好，例如：

[(1,4,3), (5,5,1), (10,14,5)]

我考虑过结合使用 itertools 和枚举。但是，我不能完全正确。我根据之前的问题将其拼凑在一起，但它缺少起始索引：

counts=[]
count=1
for idx, (a,b) in enumerate(itertools.zip_longest(s, s[1:], fillvalue=None)):
    if a==b=="-":
        count += 1
    elif a!=b and a =="-":
        counts.append((idx,count))
        count = 1
print(counts)

输出：

[(3, 3), (5,1), (14,5)]

我从其他问题中拼凑了以下内容：

g = groupby(enumerate(s), lambda x:x[1])
l = [(x[0], list(x[1])) for x in g if x[0] == '-']
[(x[1][0][0],x[1][-1][0], len(x[1])) for x in l]

输出：

[(1, 3, 3), (5, 5, 1), (10, 14, 5)]

好像可以，但是我不是特别明白它的逻辑，也不确定它是否总是有效。有没有更好的办法？或者这是否尽可能有效？我将需要执行数十万次搜索，因此效率是关键。

谢谢！

Answer 1

使用re.finditer的一种方式：

[(*m.span(), len(m.group(0))) for m in re.finditer("-+", s)]

输出：

[(1, 4, 3), (5, 6, 1), (10, 15, 5)]

查找字符串中的连续字符 + 它们的开始和结束索引 (python)

Find consecutive characters in a string + their start and end indices (python)

python

itertools