匹配%，避免匹配\%

Question

我正在尝试从我的 TeX 代码中删除注释。我想在 % 之后添加 trim 文本，但想避免转义 \%。我以为这样就可以了

re.sub(r"([^%]*)([^\][%])(.*)$", r"", "10 \% foo.% bar")

哪个输出差不多对

'10 \% foo'

预期输出：

'10 \% foo.'

为什么它 trim 去掉了 % 之前的最后一个字符？ 而且，我该如何避免呢？

Answer 1

尝试使用 \ 转义 .，来自：

re.sub(r"([^%]*)([^\][%])(.*)$", r"", "10 \% foo.% bar")

至：

re.sub(r"([^%]*)([^\][%])(.*)$", r"", "10 \% foo\.% bar")

Answer 2

您的问题是您的正则表达式匹配 [零个或多个 non-percent 个字符（第 1 组）]，然后它匹配 [一个 non-backslash 个字符 和一个百分比字符（第 2 组）].

您用第一组替换了整个匹配项，因此您错过了第 2 组中的 non-backslash 字符

而是使用 negative lookbehind, which only matches percent characters without a backslash before them, and then everything until the rest of the line Try it:

(?<!\)%.*$

在python中：

>>> re.sub(r"(?<!\)%.*$", "", "10 \% foo.% bar")
'10 \% foo.'

对于 multi-line 字符串，使用 re.M 标志：

>>> ss = """10 \% foo.% bar"
Hello world
Hello world % this is a comment
% This is also a comment
"""
>>> print(re.sub(r"(?<!\)%.*$", "", ss, flags=re.M))
10 \% foo.
Hello world
Hello world

Answer 3

您可以使用负向回顾来确保您只匹配 % 而不是 \%。

(?<!\)%.*

请参阅 re 文档：https://docs.python.org/3/library/re.html

匹配%，避免匹配\%

Match %, avoid match of \%

python

python-re