将字符串转换为整数数组时仅将数字放在捕获组中
Placing only digits in capture groups when converting a string to an array of ints
背景
所以这个问题的灵感来自codereview上的以下问题:
Converting a string to an array of integers。打开如下:
I am dealing with a string draw_result that can be in one of the
following formats:
"03-23-27-34-37, Mega Ball: 13"
"01-12 + 08-20"
"04-15-17-25-41"
I always start with draw_result where the value is one from the above
values. I want to get to:
[3, 23, 27, 34, 37]
[1, 12, 8, 20]
[4, 15, 17, 25, 41]
这道题可以用多个正则表达式解决如下
import re
from typing import Iterable
lottery_searches = [
re.compile(pat).match
for pat in (
r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d\d), Mega Ball.*$',
r'^(\d\d)-(\d\d) \+ (\d\d)-(\d\d)$',
r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d+)$',
)
]
def lottery_string_to_ints(lottery: str) -> Iterable[int]:
for search in lottery_searches:
if match := search(lottery):
return (int(g) for g in match.groups())
raise ValueError(f'"{lottery}" is not a valid lottery string')
问题
假设我们允许使用与 -
不同的分隔符,但它们 必须 都相等。例如
"04/15/17/25/41"
"04,15,17,25,41"
"01,12 + 08,20"
"01?12 + 08?20"
都是有效格式。
现在是否可以仅在捕获组中包含数字?是否可以通过某种方式标记所有数字捕获组以便于检索?
尝试解决
正则表达式
PATTERN = re.compile(
r"""
(?P<digit0>\d\d) # Matches a double digit [00..99] and names it digit0
(?P<sep>-) # Matches any one digit character - saves it as sep
(?P<digit1>\d\d) # Matches a double digit [00..99] and names it digit1
(\s+\+\s+|(?P=sep)) # Matches SPACE + SPACE OR the seperator saved in sep (-)
(?P<digit2>\d\d) # Matches a double digit [00..99] and names it digit2
(?P=sep) # Matches any one digit character - saves it as sep
(?P<digit3>\d\d) # Matches a double digit [00..99] and names it digit3
((?P=sep)(?P<digit4>\d\d))? # Checks if there is a final fifth digit (-01), saves to digit5
""",
re.VERBOSE,
)
检索
def extract_numbers_narrow(draw_result, digits=5):
numbers = []
if match := re.match(PATTERN2, draw_result):
for i in range(digits):
ith_digit = f"digit{i}"
try:
number = int(match.group(ith_digit))
except IndexError: # Catches if the group does not exists
continue
except TypeError: # Catches if the group is None
continue
numbers.append(number)
return numbers
- 问题:我必须包含一个脏的
try
语句,因为结果中可能会或可能不会出现第五位数字。
您似乎想要获取逗号前的所有数字。您可以使用这个基于 PyPi regex
的解决方案
import regex
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = regex.compile(r'^(?:[^\w,]*(\d+))+')
for text in texts:
match = reg.search(text)
if match:
print( text, '=>', list(map(int,match.captures(1))) )
^(?:[^\w,]*(\d+))+
正则表达式匹配一个或多个序列,其中任何零个或多个字符(单词和逗号字符除外)后跟字符串开头的一个或多个数字(捕获到第 1 组)。由于 regex
为每个捕获组保留一个堆栈,您可以使用 .captures()
.
访问所有捕获的数字
如果您需要使用内置的 re
,您可以使用
import re
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = re.compile(r'^(?:[^\w,]*\d+)+')
for text in texts:
match = reg.search(text)
if match:
print( text, '=>', list(map(int,re.findall(r'\d+', match.group()))) )
参见 this Python demo,其中 re.findall(r'\d+'...)
从匹配值中提取数字。
两者输出:
03-23-27-34-37, Mega Ball: 13 => [3, 23, 27, 34, 37]
01-12 + 08-20 => [1, 12, 8, 20]
04-15-17-25-41 => [4, 15, 17, 25, 41]
背景
所以这个问题的灵感来自codereview上的以下问题: Converting a string to an array of integers。打开如下:
I am dealing with a string draw_result that can be in one of the following formats:
"03-23-27-34-37, Mega Ball: 13" "01-12 + 08-20" "04-15-17-25-41"
I always start with draw_result where the value is one from the above values. I want to get to:
[3, 23, 27, 34, 37] [1, 12, 8, 20] [4, 15, 17, 25, 41]
这道题可以用多个正则表达式解决如下
import re
from typing import Iterable
lottery_searches = [
re.compile(pat).match
for pat in (
r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d\d), Mega Ball.*$',
r'^(\d\d)-(\d\d) \+ (\d\d)-(\d\d)$',
r'^(\d\d)-(\d\d)-(\d\d)-(\d\d)-(\d+)$',
)
]
def lottery_string_to_ints(lottery: str) -> Iterable[int]:
for search in lottery_searches:
if match := search(lottery):
return (int(g) for g in match.groups())
raise ValueError(f'"{lottery}" is not a valid lottery string')
问题
假设我们允许使用与
-
不同的分隔符,但它们 必须 都相等。例如"04/15/17/25/41" "04,15,17,25,41" "01,12 + 08,20" "01?12 + 08?20"
都是有效格式。
现在是否可以仅在捕获组中包含数字?是否可以通过某种方式标记所有数字捕获组以便于检索?
尝试解决
正则表达式
PATTERN = re.compile(
r"""
(?P<digit0>\d\d) # Matches a double digit [00..99] and names it digit0
(?P<sep>-) # Matches any one digit character - saves it as sep
(?P<digit1>\d\d) # Matches a double digit [00..99] and names it digit1
(\s+\+\s+|(?P=sep)) # Matches SPACE + SPACE OR the seperator saved in sep (-)
(?P<digit2>\d\d) # Matches a double digit [00..99] and names it digit2
(?P=sep) # Matches any one digit character - saves it as sep
(?P<digit3>\d\d) # Matches a double digit [00..99] and names it digit3
((?P=sep)(?P<digit4>\d\d))? # Checks if there is a final fifth digit (-01), saves to digit5
""",
re.VERBOSE,
)
检索
def extract_numbers_narrow(draw_result, digits=5):
numbers = []
if match := re.match(PATTERN2, draw_result):
for i in range(digits):
ith_digit = f"digit{i}"
try:
number = int(match.group(ith_digit))
except IndexError: # Catches if the group does not exists
continue
except TypeError: # Catches if the group is None
continue
numbers.append(number)
return numbers
- 问题:我必须包含一个脏的
try
语句,因为结果中可能会或可能不会出现第五位数字。
您似乎想要获取逗号前的所有数字。您可以使用这个基于 PyPi regex
的解决方案
import regex
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = regex.compile(r'^(?:[^\w,]*(\d+))+')
for text in texts:
match = reg.search(text)
if match:
print( text, '=>', list(map(int,match.captures(1))) )
^(?:[^\w,]*(\d+))+
正则表达式匹配一个或多个序列,其中任何零个或多个字符(单词和逗号字符除外)后跟字符串开头的一个或多个数字(捕获到第 1 组)。由于 regex
为每个捕获组保留一个堆栈,您可以使用 .captures()
.
如果您需要使用内置的 re
,您可以使用
import re
texts = ['03-23-27-34-37, Mega Ball: 13', '01-12 + 08-20', '04-15-17-25-41']
reg = re.compile(r'^(?:[^\w,]*\d+)+')
for text in texts:
match = reg.search(text)
if match:
print( text, '=>', list(map(int,re.findall(r'\d+', match.group()))) )
参见 this Python demo,其中 re.findall(r'\d+'...)
从匹配值中提取数字。
两者输出:
03-23-27-34-37, Mega Ball: 13 => [3, 23, 27, 34, 37]
01-12 + 08-20 => [1, 12, 8, 20]
04-15-17-25-41 => [4, 15, 17, 25, 41]