使用 Python re 和 findall 匹配字符串中数字的复杂组合

Question

我正在尝试使用 python re 库来分析包含街道名称和多个（或仅一个）由正斜杠分隔的数字的字符串。

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'

我想匹配所有数字，包括点之后的位置和相邻的字母字符。如果连字符用字母字符连接两个数字，它们也应被视为一个匹配项。

预期输出：

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

我正在尝试以下方法

numbers = re.findall(r'\d+\.*\d*\w[-\w]*', example)

能够找到除单个非浮点数以外的所有数字（即 '1'）：

print(numbers)

['2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

我需要如何调整我的正则表达式才能获得所需的输出？

Answer 1

这个有效：

 numbers = re.findall(r'\d[0-9a-z\-\.]*', example)

Answer 2

模式与单个 1 不匹配，因为 \d+\.*\d*\w[-\w]* 需要至少 2 个字符，对于 \d+ 至少是 1 个数字，对于 \w

至少是 1 个单词字符

如果地址不应该以-结尾，并且只能匹配数字后的字符a-z，并且使用不区分大小写的匹配：

\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*

\b一个单词边界
\d+(?:\.\d+)? 匹配带有可选小数部分的数字
[a-z]* 匹配可选字符 a-z
(?:-\w+)*可选重复匹配-和1个或多个单词字符

Regex demo

请注意，匹配地址可能很困难，因为可能有许多不同的符号，此模式匹配示例字符串中的给定格式。

import re

example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
pattern = r"\b\d+(?:\.\d+)?[a-z]*(?:-\w+)*"
print(re.findall(pattern, example))

输出

['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

Answer 3

使用正则表达式

工作示例：https://regex101.com/r/PDYSgH/1

import re
example = 'Examplestreet 1/2.1/3a/10/10.1/11b/12a-12c/13a-c'
numbers = re.findall(r'\d[a-z0-9.\-]*', example)

使用拆分

可能您可以使用 space 拆分字符串，然后使用 /。

numbers = example.split(" ")[-1].split("/")

Answer 4

另一个解决方案，似乎更简单：

>> re.findall(r'\d[^/]*', example)
['1', '2.1', '3a', '10', '10.1', '11b', '12a-12c', '13a-c']

您可以确认它有效 here（尽管我不得不转义斜杠 (/) 字符）。

\d[^/]*：匹配以数字开头且后跟任何字符的任何字符串，/ 除外（在所述字符处停止）。

使用 Python re 和 findall 匹配字符串中数字的复杂组合

Using Python re and findall to match complex combination of digits in string

python

regex

python-re