Python 正则表达式未找到特定模式

Python regex doesn't find certain pattern

我正在尝试从 html 代码中解析乳胶代码,如下所示:

string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "

我想用一个函数的输出替换所有的乳胶代码,该函数将乳胶代码作为参数(由于找到正确的模式有问题,函数 extract returns目前为空字符串)。

我试过了:

latex_end = "\)"
latex_start = "\("    
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), extract, string)

结果:

your answer is wrong! Solution: based on \= 0 \) and \=0\) beeing ...

预计:

your answer is wrong! Solution: based on and beeing ...

知道为什么找不到模式吗?有实现的方法吗?

这是因为反斜杠在 Python 中用作转义字符。这使得处理这些情况非常棘手。以下是完成这项工作的两种快速方法:

import re

extract = lambda a: ""

# Using no raw components
string = " your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "
latex_bounds = ("\\(", "\\)\)")
print(re.sub('{}.*?{}'.format(*latex_bounds), extract, string))

# Using all raw components (backslashes mean nothing, but not really)
string = r"%s" % string
latex_bounds = (r"\\(", r"\\)")
print(re.sub(r'{}.*?{}'.format(*latex_bounds), extract, string))

您应该使用原始字符串来定义 string,因为 \v 被解释为特殊字符。

import re

string = r" your answer is wrong! Solution: based on \((\vec{n_E},\vec{g})= 0 \) and \(d(g,E)=0\) beeing ... "


string = re.sub(r'\\(.*?\\)', '', string))
print(string)

打印:

 your answer is wrong! Solution: based on  and  beeing ...

如果您需要开始和结束的变量:

latex_end = r"\\)"
latex_start = r"\\("    
string = re.sub(r'{}.*?{}'.format(latex_start, latex_end), '', string)
print(string)