重复使用相同的前缀来查找下一个匹配项(如果有)

Reuse the same prefix to find the next match if any

我有这样的字符串:

string = '
something .... something else ...
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
other things ...
'

没有任何CR/LF,都在一条线上

我想创建一个正则表达式:

到目前为止我写了:

\/transfer\/packages\/[^"]*([A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12})"

但它只捕获最后一个 guid。我需要一些如何重用前缀 /transfer/packages/ 并保持匹配,每次都热切地扩展搜索而不从前缀继续。

来自这个 SO :

As for the second question, it is a common problem. It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer. You cannot have more submatches in the resulting array than the number of capturing groups inside the regex pattern. See Repeating a Capturing Group vs. Capturing a Repeated Group for more details.

如果您在 Python 中使用 re 模块,那么可以使用 str.startwith 并尝试:

import re
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
if url.startswith('/transfer/packages/'):
    Guid_List = re.findall(r'(?i)[a-z0-9]{8}(?:-[a-z0-9]{4}){3}-[a-z0-9]{12}', url)
print(Guid_List)

您可以在回顾中使用支持无限长度量词的 PyPi regex module

(?<=url="/transfer/packages/[^\r\n"]*)[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}(?=[^\r\n"]*")

示例Regex demo (with another engine selected for demo purpose) or see a Python demo


另一种选择是先匹配 url="/transfer/packages/ 后跟 guid 的行,然后匹配到下一个双引号。

然后你可以使用 re.findall 来获取所有的 guids。

"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"

Regex demo | Python demo

例如:

import re

regex = r'"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"'
test_str = ("something .... something else ...\n"
    "url=\"/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../....\"\n"
    "other things ...\n\n"
    "68f74d66-ca3d-4272-9b59-4f737946b300")

for str in re.findall(regex, test_str):
    print(re.findall(r"[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}", str))

输出

['00000000-0000-0000-0000-000000000000', '68f74d66-ca3d-4272-9b59-4f737946b3f7', '138bb190-3b12-4855-88e2-0d1cdf46aeb5']