将二进制拆分为 0 和 1 组并获取边界索引

Question

我有一个二进制数，我需要将其分为 0 和 1 组。我还需要获取每个新成立的组的开始和结束索引

例如，假设数字是1100111100

我需要将其分组为 11,00,1111,00

现在每组的开始和结束索引应该是这样的

11 : (1,2) , 00 : (3,4) , 1111:(5,8) and 00: (9,10)

我打算使用 Python。我研究并发现 itertools 可以提供帮助，但不确定在 itertools 中使用哪个函数。

非常感谢任何帮助

谢谢

Answer 1

请您尝试以下操作：

import re

str = '1100111100'
l = re.findall('0+|1+', str)    # now l = ['11', '00', '1111', '00']
l2 = []
pos = 1
for x in l:
    l2.append("%s : (%d,%d)" % (x, pos, pos + len(x) - 1))
    pos += len(x)

print(l2)

输出：

['11 : (1,2)', '00 : (3,4)', '1111 : (5,8)', '00 : (9,10)']

Answer 2

这可以用这样的正则表达式在一行中完成

a = "1100111100"
[' : '.join([i.group(),str((i.start()+1,i.end()))]) for i in re.finditer("0+|1+",a)]

re.finditer

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string.

表示所有唯一命中都在迭代器中返回

输出

['11 : (1, 2)', '00 : (3, 4)', '1111 : (5, 8)', '00 : (9, 10)']

Answer 3

使用itertools.groupby：

from itertools import groupby

def func(string):
    i = 1
    for _, g in groupby(string):
        g = ''.join(g)
        j = len(g)
        yield (i, i+j-1), g
        i += j

>>> dict(func('1100111100'))
{(1, 2): '11', (3, 4): '00', (5, 8): '1111', (9, 10): '00'}

要将其用作字典，键必须是您的结束索引和起始索引，值是子字符串。

将二进制拆分为 0 和 1 组并获取边界索引

Split a binary to groups of 0s and 1s and get the boundary index

python

regex

binary

split

itertools