正则表达式差异 Python2 和 Python3

Question

我想将此代码从 python2 移植到 python3

p = re.compile(ur"(?s)<u>(.*?)<\/u>")
subst = "\underline{\1}"
raw_html = re.sub(p, subst, raw_html)

我已经知道 ur 应该改为 r:

p = re.compile(r"(?s)<u>(.*?)<\/u>")
subst = "\underline{\1}"
raw_html = re.sub(p, subst, raw_html)

但是它不起作用，它抱怨这个：

cd build && PYTHONWARNINGS="ignore" python3 ../src/katalog.py --katalog 1
Traceback (most recent call last):
  File "src/katalog.py", line 11, in <module>
    from common import *
  File "src/common.py", line 207
    subst = "\underline{\1}"
                             ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
make: *** [katalog1] Error 1

然而将其更改为 "\underline" 也无济于事。那就不换了。

Answer 1

使用

import re
raw_html = r"<u>1</u> and <u>2</u>"
p = re.compile(r"(?s)<u>(.*?)</u>")
subst = r"\underline{}"
raw_html = re.sub(p, subst, raw_html)
print(raw_html)

见Python proof，结果为\underline{1} and \underline{2}。基本上，在替换内部，使用双反斜杠替换为单个反斜杠。在 Python.

中使用原始字符串文字让正则表达式的生活更轻松

正则表达式差异 Python2 和 Python3

Regex Difference Python2 and Python3

python

regex

python-2.x

python-3.x