正则表达式否定前瞻忽略评论

Question

我在使用此正则表达式时遇到问题。我只想拉出 MATCH3，因为其他 MATCH1 和 MATCH2 被注释掉了。

#   url(r'^MATCH1/$',),
   #url(r'^MATCH2$',),
    url(r'^MATCH3$',), # comment

我的正则表达式捕获了所有匹配项。

(?<=url\(r'\^)(.*?)(?=$',)

如何忽略以评论开头的行？消极的前瞻？请注意 # 字符不一定位于行首。

编辑：抱歉，所有答案都很好！该示例忘记了匹配组末尾 $' 后的逗号。

Answer 1

^\s*#.*$|(?<=url\(r'\^)(.*?)(?=$'\))

尝试 this.Grab capture.See 演示。

https://www.regex101.com/r/rK5lU1/37

import re
p = re.compile(r'^\s*#.*$|(?<=url\(r\'\^)(.*?)(?=$\'\))', re.IGNORECASE | re.MULTILINE)
test_str = "# url(r'^MATCH1/$'),\n #url(r'^MATCH2$'),\n url(r'^MATCH3$') # comment"

re.findall(p, test_str)

Answer 2

如果这是您唯一需要匹配的地方，则匹配行首后跟可选的空格后跟 url:

(?m)^\s*url\(r'(.*?)'\)

如果您需要涵盖更复杂的情况，我建议您改用 ast.parse，因为它真正理解 Python 源代码解析规则。

import ast

tree = ast.parse("""(
#   url(r'^MATCH1/$'),
   #url(r'^MATCH2$'),
    url(r'^MATCH3$') # comment
)""")

class UrlCallVisitor(ast.NodeVisitor):
    def visit_Call(self, node):
        if getattr(node.func, 'id', None) == 'url':
            if node.args and isinstance(node.args[0], ast.Str):
                print(node.args[0].s.strip('$^'))

        self.generic_visit(node)

UrlCallVisitor().visit(tree)

打印给名为 url 的函数的每个第一个字符串文字参数；在这种情况下，它会打印 MATCH3。请注意，ast.parse 的源代码必须是格式正确的 Python 源代码（因此要有括号，否则会引发 SyntaxError）。

Answer 3

你真的不需要在这里使用lookarounds，你可以寻找可能的前导空格然后匹配"url"和前面的上下文；捕捉你想保留的部分。

>>> import re
>>> s = """#   url(r'^MATCH1/$',),
   #url(r'^MATCH2$',),
    url(r'^MATCH3$',), # comment"""
>>> re.findall(r"(?m)^\s*url\(r'\^([^$]+)", s)
['MATCH3']

Answer 4

作为替代方案，如果第一个元素中有 'url' （它不以 # 开头），您可以用 '#' 分隔行，您可以使用 re.search 来匹配子-你想要的字符串：

>>> [re.search(r"url\(r'\^(.*?)$'" ,i[0]).group(1) for i in [line.split('#') for line in s.split('\n')] if 'url' in i[0]]
['MATCH3']

另请注意，您不需要为您的模式起诉环视，您可以只使用分组！

正则表达式否定前瞻忽略评论

Regex negative lookahead ignoring comments

python

regex

negative-lookahead