重复使用相同的前缀来查找下一个匹配项(如果有)
Reuse the same prefix to find the next match if any
我有这样的字符串:
string = '
something .... something else ...
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
other things ...
'
没有任何CR/LF,都在一条线上
我想创建一个正则表达式:
- 当且仅当 url 以
/transfer/packages/
开头
- 捕获每个后续的 GUID
- 直到引用字符串结束
"
- 要找到的 GUID 数量未知,但至少是一个
到目前为止我写了:
\/transfer\/packages\/[^"]*([A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12})"
但它只捕获最后一个 guid。我需要一些如何重用前缀 /transfer/packages/
并保持匹配,每次都热切地扩展搜索而不从前缀继续。
来自这个 SO :
As for the second question, it is a common problem. It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer. You cannot have more submatches in the resulting array than the number of capturing groups inside the regex pattern. See Repeating a Capturing Group vs. Capturing a Repeated Group for more details.
如果您在 Python 中使用 re
模块,那么可以使用 str.startwith
并尝试:
import re
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
if url.startswith('/transfer/packages/'):
Guid_List = re.findall(r'(?i)[a-z0-9]{8}(?:-[a-z0-9]{4}){3}-[a-z0-9]{12}', url)
print(Guid_List)
您可以在回顾中使用支持无限长度量词的 PyPi regex module:
(?<=url="/transfer/packages/[^\r\n"]*)[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}(?=[^\r\n"]*")
示例Regex demo (with another engine selected for demo purpose) or see a Python demo
另一种选择是先匹配 url="/transfer/packages/
后跟 guid 的行,然后匹配到下一个双引号。
然后你可以使用 re.findall 来获取所有的 guids。
"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"
例如:
import re
regex = r'"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"'
test_str = ("something .... something else ...\n"
"url=\"/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../....\"\n"
"other things ...\n\n"
"68f74d66-ca3d-4272-9b59-4f737946b300")
for str in re.findall(regex, test_str):
print(re.findall(r"[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}", str))
输出
['00000000-0000-0000-0000-000000000000', '68f74d66-ca3d-4272-9b59-4f737946b3f7', '138bb190-3b12-4855-88e2-0d1cdf46aeb5']
我有这样的字符串:
string = '
something .... something else ...
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
other things ...
'
没有任何CR/LF,都在一条线上
我想创建一个正则表达式:
- 当且仅当 url 以
/transfer/packages/
开头
- 捕获每个后续的 GUID
- 直到引用字符串结束
"
- 要找到的 GUID 数量未知,但至少是一个
到目前为止我写了:
\/transfer\/packages\/[^"]*([A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12})"
但它只捕获最后一个 guid。我需要一些如何重用前缀 /transfer/packages/
并保持匹配,每次都热切地扩展搜索而不从前缀继续。
来自这个 SO
As for the second question, it is a common problem. It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer. You cannot have more submatches in the resulting array than the number of capturing groups inside the regex pattern. See Repeating a Capturing Group vs. Capturing a Repeated Group for more details.
如果您在 Python 中使用 re
模块,那么可以使用 str.startwith
并尝试:
import re
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
if url.startswith('/transfer/packages/'):
Guid_List = re.findall(r'(?i)[a-z0-9]{8}(?:-[a-z0-9]{4}){3}-[a-z0-9]{12}', url)
print(Guid_List)
您可以在回顾中使用支持无限长度量词的 PyPi regex module:
(?<=url="/transfer/packages/[^\r\n"]*)[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}(?=[^\r\n"]*")
示例Regex demo (with another engine selected for demo purpose) or see a Python demo
另一种选择是先匹配 url="/transfer/packages/
后跟 guid 的行,然后匹配到下一个双引号。
然后你可以使用 re.findall 来获取所有的 guids。
"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"
例如:
import re
regex = r'"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"'
test_str = ("something .... something else ...\n"
"url=\"/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../....\"\n"
"other things ...\n\n"
"68f74d66-ca3d-4272-9b59-4f737946b300")
for str in re.findall(regex, test_str):
print(re.findall(r"[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}", str))
输出
['00000000-0000-0000-0000-000000000000', '68f74d66-ca3d-4272-9b59-4f737946b3f7', '138bb190-3b12-4855-88e2-0d1cdf46aeb5']