我怎样才能告诉这个正则表达式忽略 www url 并捕获 ('Inter.ttf') 和 (Inter.ttf)

Question

所以我目前有一个正则表达式，它通过将 '/static/user' 附加到每个 url 来替换 css urls。但是，我想让这个表达式忽略 www urls (https://www.google.com) 并捕获 url('Inter-Black.ttf') 和 url(Inter-Black.ttf), 带引号和不带引号。

这是我当前的代码：

p = re.compile(r"(?<=url\(')/?(?=(?:.*?\.\w{3})'\))")
filez = re.sub(p, '/static/user/', filez)

我也试过了，还是不行

p = re.compile(r"^(?!.*(www))(?<=url\(')/?(?=(?:.*?\.\w{3})'\))")
filez = re.sub(p, '/static/user/', filez)

目前使用此解决方案来捕获引号和非引号。但理想情况下，我想要一个更清洁的解决方案以及忽略 www urls.

的正则表达式

p = re.compile(r"(?<=url\(')/?(?=(?:.*?\.\w{3})'\))")
filez = re.sub(p, '/static/user/', filez)
p = re.compile(r"(?<=url\()/?(?=(?:.*?\.\w{3})\))")
filez = re.sub(p, '/static/user/', filez)

感谢您的宝贵时间。

Answer 1

希望有所帮助

import re

filez = """url(https://www.google.com)
url('Inter-Black.ttf')
url(Inter-Black.ttf)
"""


p = re.compile(r"(url\(\'?)((?!.*www).*\.\w{3})(\'?\))")

filez = re.sub(p, r'/static/user/', filez)

print(filez)

输出：

url(https://www.google.com)
url('/static/user/Inter-Black.ttf')
url(/static/user/Inter-Black.ttf)

Answer 2

您可以使用带有 if 子句的可选捕获组来说明开始 ' 与结束 '

匹配

(url\()(?!https?://www\.google\.com)(')?/?([^()']*)((?(2)')\))

模式匹配：

(url\() 捕获 组 1，匹配 url(
(?!https?://www\.google\.com) 否定前瞻，断言不是特定的url
(')? 可选择在 组 2

'

/? 可选择匹配前导 /
([^()']*) 捕获 组 3，匹配除 ( ) 或 '
( 捕获 第 4 组
- (?(2)') If 子句检查组 2 是否存在，如果存在，则匹配 '
- \) 匹配 )
) 关闭群组

Regex demo | Python demo

import re

filez = """url(https://www.google.com)
url('Inter-Black.ttf')
url(Inter-Black.ttf)
url('Inter-Black.ttf)
url('/fonts/user/Inter-Black.ttf')
url('/fonts/user/mywebsite/Inter-Black.ttf')
"""

p = r"(url\()(?!https?://www\.google\.com)(')?/?([^()']*)((?(2)')\))"
filez = re.sub(p, r"/static/user/", filez)
print(filez)

输出

url(https://www.google.com)
url('/static/user/Inter-Black.ttf')
url(/static/user/Inter-Black.ttf)
url('Inter-Black.ttf)
url('/static/user/fonts/user/Inter-Black.ttf')
url('/static/user/fonts/user/mywebsite/Inter-Black.ttf')

我怎样才能告诉这个正则表达式忽略 www url 并捕获 ('Inter.ttf') 和 (Inter.ttf)

How can I tell this regex expression to ignore www urls and catch both ('Inter.ttf') and (Inter.ttf)

python

regex

regex-lookarounds

python-re