Python 多行内容的负环视正则表达式

Question

我是 Python 的新手。我正在处理 LaTeX 文件，其中包含大量数学、编程代码等。我已将多个 space " +" 替换为 " "。但是我需要忽略我代码的某些部分。例如：

普通文本："Hai, I am New to Python"。我已经用 "Hai, I am New to Python" 将 multi space 替换为单个 space。此正则表达式应用于整个文档。但我需要忽略某些 LaTeX 标签中的 multi space 。例如

Hai, I am    New to       Python
\begin{lstlisting}[title=Sample]
      print("Hai, I am    New to       Python")
      def Code(a):
          print(a)
      Code("Hai, i am new to Perl")
\end{lstlisting}

在我的代码 multi space 更改为 single space 之后 \begin{lstlisting} 到 \end{lstlisting}

"Hai, I am New to Python"
\begin{lstlisting}[title=Sample]
 print("Hai, I am New to Python")
 def Code(a):
 print(a)
 Code("Hai, i am new to Perl")
\end{lstlisting}

如何忽略 \begin{lstlisting} 到 \end{lstlisting} 之间的 python 正则表达式？

Answer 1

正确的 LaTeX 解析器是可行的方法，但这可能是 'good enough' 解决方案。看看你的想法。

import re

text = '''
Hai, I am    New to       Python
\begin{lstlisting}[title=Sample]
      print("Hai, I am    New to       Python")
      def Code(a):
          print(a)
      Code("Hai, i am new to Perl")
\end{lstlisting}
'''
  
text = re.sub(r' +(?!(?:(?!\begin\{lstlisting\}).)*\end\{lstlisting\})', ' ', text, flags=re.DOTALL)

print(text)

如果 \end{lstlisting} 出现在字符串前面而没有 \begin{lstlisting} 出现在它之前，它的工作原理是不替换空格。

Python 多行内容的负环视正则表达式

Python Regex for Negative Lookaround for multiline content

python

regex