Python 2.7.16 - 正则表达式 lookbehind 不适用于 Findall

Question

我有如下一段代码注意：行变量来自我正在阅读的文本文件中的一行，模式变量保存在我选择并在代码中应用的配置文件中

line ="[u'INVOICE# SMR/0038 f"', u'', u'', u'']"
pattern ='(?<=(invoice#)\s)[A-z]{3}/\d{1,5}'

regex = re.compile(r'' + pattern),re.IGNORECASE)
invNum= re.findall(pattern, str(line),re.IGNORECASE)[0]
      ........

我希望得到 invNum = SMR/0038，但我却得到了 invoice#。有什么问题？如果在 https://regexr.com/ I see that the lookbehind is working. But transferring it to Python code doesn't work. See image below from https://regexr.com/

上尝试此模式

sample from regexr

Answer 1

由于在模式中，你得到 invoice# 子字符串，因为你用捕获组包装它。

另外，请注意，它是正则表达式世界中最令人困惑的模式之一。使用 [A-Za-z].

你需要捕捉你想提取的部分，你甚至不需要回顾：

import re
line ="[u'INVOICE# SMR/0038 f\"', u'', u'', u'']"
pattern = re.compile('invoice#\s+([A-Za-z]{3}/\d{1,5})', re.I)
print( re.findall(pattern, line) ) # => ['SMR/0038']

见online demo

其实如你所愿（re.findall returns全部匹配）：

m = pattern.search(line)
if m:
    print(m.group(1)) # => SMR/0038

Python 2.7.16 - 正则表达式 lookbehind 不适用于 Findall

Python 2.7.16 - regex lookbehind not working with Findall

python

regex

lookbehind