将字符串转换为整数数组时仅将数字放在捕获组中

Placing only digits in capture groups when converting a string to an array of ints

背景

所以这个问题的灵感来自codereview上的以下问题: Converting a string to an array of integers。打开如下:

I am dealing with a string draw_result that can be in one of the following formats:

"03-23-27-34-37, Mega Ball: 13" 
"01-12 + 08-20" 
"04-15-17-25-41"

I always start with draw_result where the value is one from the above values. I want to get to:

[3, 23, 27, 34, 37] 
[1, 12, 8, 20]
[4, 15, 17, 25, 41]

这道题可以用多个正则表达式解决如下

import re
from typing import Iterable

lottery_searches = [
    re.compile(pat).match
    for pat in (
        r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d\d), Mega Ball.*$',
        r'^(\d\d)-(\d\d) \+ (\d\d)-(\d\d)$',
        r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d+)$',
    )
]


def lottery_string_to_ints(lottery: str) -> Iterable[int]:
    for search in lottery_searches:
        if match := search(lottery):
            return (int(g) for g in match.groups())

    raise ValueError(f'"{lottery}" is not a valid lottery string')

问题

尝试解决

正则表达式

PATTERN = re.compile(
    r"""
         (?P<digit0>\d\d)                   # Matches a double digit [00..99] and names it digit0
         (?P<sep>-)                         # Matches any one digit character - saves it as sep
         (?P<digit1>\d\d)                   # Matches a double digit [00..99] and names it digit1
         (\s+\+\s+|(?P=sep))                # Matches SPACE + SPACE OR the seperator saved in sep (-)
         (?P<digit2>\d\d)                   # Matches a double digit [00..99] and names it digit2
         (?P=sep)                           # Matches any one digit character - saves it as sep
         (?P<digit3>\d\d)                   # Matches a double digit [00..99] and names it digit3
         ((?P=sep)(?P<digit4>\d\d))?        # Checks if there is a final fifth digit (-01), saves to digit5
        """,
    re.VERBOSE,
)

检索

def extract_numbers_narrow(draw_result, digits=5):
    numbers = []
    if match := re.match(PATTERN2, draw_result):
        for i in range(digits):
            ith_digit = f"digit{i}"
            try:
                number = int(match.group(ith_digit))
            except IndexError:  # Catches if the group does not exists
                continue
            except TypeError:  # Catches if the group is None
                continue
            numbers.append(number)
    return numbers

您似乎想要获取逗号前的所有数字。您可以使用这个基于 PyPi regex 的解决方案

import regex

texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = regex.compile(r'^(?:[^\w,]*(\d+))+')

for text in texts:
    match = reg.search(text)
    if match:
        print( text, '=>', list(map(int,match.captures(1))) )

参见online Python demo

^(?:[^\w,]*(\d+))+ 正则表达式匹配一个或多个序列,其中任何零个或多个字符(单词和逗号字符除外)后跟字符串开头的一个或多个数字(捕获到第 1 组)。由于 regex 为每个捕获组保留一个堆栈,您可以使用 .captures().

访问所有捕获的数字

如果您需要使用内置的 re,您可以使用

import re
 
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = re.compile(r'^(?:[^\w,]*\d+)+')
 
for text in texts:
    match = reg.search(text)
    if match:
        print( text, '=>', list(map(int,re.findall(r'\d+', match.group()))) )

参见 this Python demo,其中 re.findall(r'\d+'...) 从匹配值中提取数字。

两者输出:

03-23-27-34-37, Mega Ball: 13 => [3, 23, 27, 34, 37]
01-12 + 08-20 => [1, 12, 8, 20]
04-15-17-25-41 => [4, 15, 17, 25, 41]