python 中的 URL 模式与结束美元符号匹配的正则表达式

Question

这是另一个场景，我想在 URL 中提取辅助路径，所以下面的 URL 应该都是 return 'a-c-d'

/opportunity/a-c-d
/opportunity/a-c-d/
/opportunity/a-c-d/123/456/
/opportunity/a-c-d/?x=1
/opportunity/a-c-d?x=1

我的代码片段如下：

m = re.match("^/opportunity/([^/]+)[\?|/|$]", "/opportunity/a-c-d")
if m:
    print m.group(1)

它适用于除第一个 /opportunity/a-c-d 之外所有可能的 URL。任何人都可以帮助解释原因并纠正我的正则表达式吗？非常感谢！

Answer 1

替代模式应该在 () 内，而不是 [] 内，后者用于匹配特定字符。

您还应该使用原始字符串，这样转义序列将按字面意义发送到 re 模块，而不是在 Python 字符串中进行解释。

m = re.match(r"^/opportunity/([^/]+)(\?|/|$])", "/opportunity/a-c-d")

或

m = re.match(r"^/opportunity/([^/]+)([?/]|$])", "/opportunity/a-c-d")

Answer 2

正则表达式中的 $ 匹配文字“$”字符，而不是行尾字符。相反，你可能想要这个：

m = re.match(r"^/opportunity/([^/?]+)\/?\??", "/opportunity/a-c-d")
if m:
    print m.group(1)

Answer 3

不要这样做。请改用 urlparse 模块。

这是一些测试代码：

from urlparse import urlparse

urls = [
  '/opportunity/a-c-d',
  '/opportunity/a-c-d/',
  '/opportunity/a-c-d/123/456/',
  '/opportunity/a-c-d/?x=1',
  '/opportunity/a-c-d?x=1',
]

def secondary(url):
  try:
    return urlparse(url).path.split('/')[2]
  except IndexError:
    return None

for url in urls:
  print '{0:30s} => {1}'.format(url, secondary(url))

这是输出

/opportunity/a-c-d             => a-c-d
/opportunity/a-c-d/            => a-c-d
/opportunity/a-c-d/123/456/    => a-c-d
/opportunity/a-c-d/?x=1        => a-c-d
/opportunity/a-c-d?x=1         => a-c-d

Answer 4

使用 () 包含您需要的所有内容。

[re.sub(r'.*(\w+-\w+-\w+).*',r'',x) for x in urls]

python 中的 URL 模式与结束美元符号匹配的正则表达式

Regex matching with end dollar sign on URL pattern in python

python

regex

regex-greedy