使用正则表达式删除字符串中间的前导零

Question

我有大量 YYYYYYYXXXXXXXXZZZZZZZZ 格式的字符串，其中 X、Y 和 Z 是固定长度的八位数字。现在，问题是我需要解析中间的整数序列并删除所有前导零。不幸的是，唯一的方法是确定三个序列中的每一个begins/ends是计算数字的数量。

我目前分两步进行，即：

m = re.match(
    r"(?P<first_sequence>\d{8})"
    r"(?P<second_sequence>\d{8})"
    r"(?P<third_sequence>\d{8})",
    string)
second_secquence = m.group(2)
second_secquence.lstrip(0)

哪个有效，并给我正确的结果，例如：

112233441234567855667788 --> 12345678
112233440012345655667788 --> 123456
112233001234567855667788 --> 12345678
112233000012345655667788 --> 123456

但是有更好的方法吗？是否可以编写一个与第二个序列匹配的正则表达式，没有前导零？

我想我正在寻找一个执行以下操作的正则表达式：

跳过前八位数字。
跳过任何前导零。
捕获之后的任何内容，直到前面有 16 个字符 behind/eight。

如前所述，上述解决方案确实有效，因此此问题的目的更多是提高我对正则表达式的了解。感谢您的指点。

Answer 1

我认为不使用正则表达式更简单。

result = my_str[8:16].lstrip('0')

Answer 2

这是"useless use of regular expressions"的典型案例。

您的字符串是定长的。剪到合适的位置就好了。

s = "112233440012345655667788"
int(s[8:16])
# -> 123456

Answer 3

同意此处的其他答案，即正则表达式并不是真正必需的。如果您真的想要使用正则表达式，那么\d{8}0*(\d*)\d{8}应该这样做。

Answer 4

只是为了证明它 是可能的 使用正则表达式：

https://regex101.com/r/8RSxaH/2

# CODE AUTO GENERATED BY REGEX101.COM (SEE LINK ABOVE)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=\d{8})((?:0*)(\d{,8}))(?=\d{8})"

test_str = ("112233441234567855667788\n"
    "112233440012345655667788\n"
    "112233001234567855667788\n"
    "112233000012345655667788")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

尽管您并不真的需要它来完成您的要求

使用正则表达式删除字符串中间的前导零

Remove leading zeros in middle of string with regex

python

regex

leading-zero