将字符串转换为整数数组时仅将数字放在捕获组中

Question

背景

所以这个问题的灵感来自codereview上的以下问题： Converting a string to an array of integers。打开如下：

I am dealing with a string draw_result that can be in one of the following formats:
"03-23-27-34-37, Mega Ball: 13" 
"01-12 + 08-20" 
"04-15-17-25-41"
I always start with draw_result where the value is one from the above values. I want to get to:
[3, 23, 27, 34, 37] 
[1, 12, 8, 20]
[4, 15, 17, 25, 41]

这道题可以用多个正则表达式解决如下

import re
from typing import Iterable

lottery_searches = [
    re.compile(pat).match
    for pat in (
        r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d\d), Mega Ball.*$',
        r'^(\d\d)-(\d\d) \+ (\d\d)-(\d\d)$',
        r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d+)$',
    )
]


def lottery_string_to_ints(lottery: str) -> Iterable[int]:
    for search in lottery_searches:
        if match := search(lottery):
            return (int(g) for g in match.groups())

    raise ValueError(f'"{lottery}" is not a valid lottery string')

问题

假设我们允许使用与 - 不同的分隔符，但它们必须都相等。例如
```
"04/15/17/25/41"
"04,15,17,25,41"
"01,12 + 08,20"
"01?12 + 08?20"
```
都是有效格式。
现在是否可以仅在捕获组中包含数字？是否可以通过某种方式标记所有数字捕获组以便于检索？

尝试解决

正则表达式

PATTERN = re.compile(
    r"""
         (?P<digit0>\d\d)                   # Matches a double digit [00..99] and names it digit0
         (?P<sep>-)                         # Matches any one digit character - saves it as sep
         (?P<digit1>\d\d)                   # Matches a double digit [00..99] and names it digit1
         (\s+\+\s+|(?P=sep))                # Matches SPACE + SPACE OR the seperator saved in sep (-)
         (?P<digit2>\d\d)                   # Matches a double digit [00..99] and names it digit2
         (?P=sep)                           # Matches any one digit character - saves it as sep
         (?P<digit3>\d\d)                   # Matches a double digit [00..99] and names it digit3
         ((?P=sep)(?P<digit4>\d\d))?        # Checks if there is a final fifth digit (-01), saves to digit5
        """,
    re.VERBOSE,
)

检索

def extract_numbers_narrow(draw_result, digits=5):
    numbers = []
    if match := re.match(PATTERN2, draw_result):
        for i in range(digits):
            ith_digit = f"digit{i}"
            try:
                number = int(match.group(ith_digit))
            except IndexError:  # Catches if the group does not exists
                continue
            except TypeError:  # Catches if the group is None
                continue
            numbers.append(number)
    return numbers

问题：我必须包含一个脏的 try 语句，因为结果中可能会或可能不会出现第五位数字。

Answer 1

您似乎想要获取逗号前的所有数字。您可以使用这个基于 PyPi regex 的解决方案

import regex

texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = regex.compile(r'^(?:[^\w,]*(\d+))+')

for text in texts:
    match = reg.search(text)
    if match:
        print( text, '=>', list(map(int,match.captures(1))) )

参见online Python demo。

^(?:[^\w,]*(\d+))+ 正则表达式匹配一个或多个序列，其中任何零个或多个字符（单词和逗号字符除外）后跟字符串开头的一个或多个数字（捕获到第 1 组）。由于 regex 为每个捕获组保留一个堆栈，您可以使用 .captures().

访问所有捕获的数字

如果您需要使用内置的 re，您可以使用

import re
 
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = re.compile(r'^(?:[^\w,]*\d+)+')
 
for text in texts:
    match = reg.search(text)
    if match:
        print( text, '=>', list(map(int,re.findall(r'\d+', match.group()))) )

参见 this Python demo，其中 re.findall(r'\d+'...) 从匹配值中提取数字。

两者输出：

03-23-27-34-37, Mega Ball: 13 => [3, 23, 27, 34, 37]
01-12 + 08-20 => [1, 12, 8, 20]
04-15-17-25-41 => [4, 15, 17, 25, 41]

将字符串转换为整数数组时仅将数字放在捕获组中

Placing only digits in capture groups when converting a string to an array of ints

python

regex

regex-group

python-3.x

python-re

背景

问题

尝试解决