如何使用 Python 自动化 LaTex 方程式中的左右配对

How to automate the left and right pairing in LaTex Equation using Python

问题

在 Latex 中,(…)[…]{…} 等分隔符的大小可以根据方程的大小增加 \left\right 分别在开始和结束定界符之前;比如\left( <equation> \right).

但是,这应该成对出现。这意味着无论何时引入\left它都应该有一对\right,否则这是错误的。但是有一些它只需要开始或结束定界符,这可以通过在缺少的对中添加 \left. (\left dot) 或 \right. (\right dot) 来解决,例如作为 \left(<equation>\right。或 \left.<equation>\right)

问:如何自动插入缺失的一对?

示例输入:

\begin{align}
    \left( content & content \right) content \left( content \left( content \right) \
    content \right) \left(content \left( content \
    content \right)
\end{align}

\begin{align}
    \left( content & content \right) content \left( content \left( content \right) \nonumber \
    content \right) \left( content \left( content \nonumber \
    content \right)
\end{align}

输出应该是:

\begin{align}
    \left( content [\right.] & [\left.] content \right) content \left( content \left( content \right) [\right.] \
    [\left.] content \right) \left( content \left( content [\right.]  [\right.] \
    [\left.] content \right)
\end{align}

\begin{align}
    \left( content [\right.] & [\left.] content \right) content \left( content \left( content \right) [\right.] \nonumber \
    [\left.] content \right) \left( content \left( content [\right.] [\right.] \nonumber \
    [\left.] content \right)
\end{align}

方括号之间的应该是自动生成的(没有方括号)。

如果没有配对,

最好借助适当的 LaTeX 解析器更普遍地解决这个问题。但是,如果您希望按照您所说的那样解决 Python 中的这个特定问题,下面是一些可以完成这项工作的代码。

要使代码开箱即用,您只需将 snippet 变量的内容替换为您感兴趣的字符串。

该代码假设您正在尝试平衡 align 块内的单行或多行方程,并且您的 snippet 是一系列不间断的(除了空格)此类块, 就像在此样品中。您应该可以接受方程式中的空格被剥离和重新排列。

import re

snippet: str = r"""
\begin{align}
    \left( content & content \right) content \left( content \left( content \right) \
    content \right) \left(content \left( content \
    content \right)
\end{align}

\begin{align}
    \left( content & content \right) content \left( content \left( content \right) \nonumber \
    content \right) \left( content \left( content \nonumber \
    content \right)
\end{align}
"""

# regex to capture stuff within the align blocks
re_align = re.compile(r'\begin\{align\}(.*?)\end\{align\}', flags=re.DOTALL)

# left bracket patterns
re_parens_left = re.compile(r'\left\(', flags=re.DOTALL)
re_braces_left = re.compile(r'\left\\{', flags=re.DOTALL)
re_square_left = re.compile(r'\left\[', flags=re.DOTALL)
# right bracket patterns
re_parens_right = re.compile(r'\right\)', flags=re.DOTALL)
re_braces_right = re.compile(r'\right\\}', flags=re.DOTALL)
re_square_right = re.compile(r'\right\]', flags=re.DOTALL)

re_break = re.compile(r'[\s]*\\[\s]*', flags=re.DOTALL)
re_nonum = re.compile(r'\nonumber', flags=re.DOTALL)

# function that does the balancing for a column string; invoked by main loop below
from collections import deque
def balance(string: str, re_left: re.Pattern, re_right: re.Pattern) -> str:
    """
        for a given bracket type, identify all occurrences of the current bracket,
            and balance them using the standard stack-based algorithm; Python collections'
            'deque' data structure serves the purpose of a stack here.
    """
    re_either = re.compile(re_left.pattern + '|' + re_right.pattern, flags=re.DOTALL)
    match_list = deque(re_either.findall(string))

    if len(match_list) == 0:
        return string # early exit if no brackets => no balancing needed

    balance_stack = deque()
    for item in match_list:
        if re_left.match(item): current_bracket = 'l'
        elif re_right.match(item): current_bracket = 'r'
        else: raise ValueError(f"got problematic bracket '{item}' in 'balance'")

        previous_bracket = balance_stack[-1] if len(balance_stack) > 0 else None

        if (previous_bracket == 'l') and (current_bracket == 'r'):
            balance_stack.pop()
        else:
            balance_stack.append(current_bracket)

    # whatever's left on the stack is the imbalance
    remaining = ''.join(balance_stack)
    imbalance_left = remaining.count('l')
    imbalance_right = remaining.count('r')

    balance_string_left = ' ' + ' '.join([r'\right.'] * imbalance_left) if imbalance_left > 0 else ''
    balance_string_right = ' '.join([r'\left.'] * imbalance_right) + ' ' if imbalance_right > 0 else ''

    nonum_match = False if re_nonum.search(string) is None else True
    result = re_nonum.sub('', string)
    nonum_string = ' \nonumber ' if nonum_match else ''
    result = balance_string_right + result + balance_string_left + nonum_string
    return result

# main loop
result_equations = []
for equation in re_align.findall(snippet):
    lines = re_break.split(equation.strip()) # split on double backslash
    result_lines = []
    for line in lines:
        columns = line.strip().split('&')
        result_columns = []
        for column in columns:
            # balance brackets using the stack algorithm
            result_column = column.strip()
            # for each type of bracket () or \{\} or [], return the balanced string 
            result_column = balance(result_column, re_parens_left, re_parens_right)
            result_column = balance(result_column, re_braces_left, re_braces_right)
            result_column = balance(result_column, re_square_left, re_square_right)
            
            result_columns.append(result_column)
        
        result_line = ' & '.join(result_columns)
        result_lines.append(result_line)

    result_equation = '\begin{align}\n    ' + ' \\\n    '.join(result_lines) + '\n\end{align}'
    result_equations.append(result_equation)

result = '\n\n'.join(result_equations)

print(result)

代码的工作原理

该代码依赖于 Python 的 re(正则表达式)库来识别感兴趣的模式。代码的第一部分编译了括号和我们期望使用的其他模式。

接下来是主循环——输入字符串 snippet 在这里按层次分解:首先是 align 等式块,然后是等式中的行 \,最后是行内的列(由 & 分隔)。

对于每一列,代码使用标准的基于堆栈的算法平衡括号;这是在 balance 函数中完成的,每种类型的括号一次。对 \nonumber 的存在进行了调整。

然后代码将平衡的列、行和方程重新连接起来以合成最终结果。

限制

代码有点繁琐,但解决了您所说的问题,只要您的规范可能存在问题,就会做出合理的简化假设。这将失败的情况(并非详尽无遗):

\begin{align}
    \left( content & content \ % comment: the wandering explorer turned \left(
    content \textup{sneaked in a \left( payload}
\end{align}

使用奇怪的边缘情况识别注释和高度嵌套的语法不在本代码的范围内。如果您打算将此用于任何事情,我建议您保持警惕 material。