用于 mathjax 语法的 Perl 正则表达式

Perl Regex for mathjax syntax

我在制作 perl 正则表达式以按照以下规则更改 \ 字符时遇到问题:

  1. 匹配序列应以 \(
  2. 开头
  3. 它应该以 \)
  4. 结尾
  5. 前一个匹配序列中的任何 \ 字符应替换为双反斜杠 \

示例文本参考:

Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\) 
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .

应该变成:

Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\) 
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .

到目前为止,这是我最好的选择:

s/(\\()(.*)(\)(.*)(\\))/\\\(\\\\\)/mg

产生:

Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\) 
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .

如你所见

\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .

错了。

如何修改我的正则表达式以满足我的需要?

我测试了@sln 正则表达式

s/(?x)(?:(?!\A)\G[^\]*\K\|\(?=\())(?=.*?(?<=\)\))/\\/g;

它似乎有效,尽管它对我来说仍然是一个神秘的谜。

更新说明

Formatted and tested:

 (?s)               # Inline Dot-All modifier
 (?:                # Cluster start
      (?! \A )           # Not beginning of string
      \G                 # G anchor - If matched before, start at end of last match
      [^\]*             # Many non-escape's
      \K                 # Previous is not part of match
      \                 # A lone escape
   |                   # or,
                         # Start of an opening '\('
      \                 # A lone escape
      (?= \( )           #   followed by an open parenth
 )                  # Cluster end
 (?=                # Lookahead, each match validates a final '\)'
      .*? 
      (?<= \ )
      \) 
 )

发布我原来的更新后的正则表达式。

原文在结尾处对 所有 转义进行了验证。
看了下,只做验证就可以加速了
有一次它找到了开始的方块。

底部是比较两种方法的基准。

更新的正则表达式:

$str =~ s/(?s)(?:(?!\A)\G(?!\))[^\]*\K\|\(?=\(.*?\\)))/\\/g;

Formatted and tested:

 (?s)               # Dot-All modifier
 (?:                # Cluster start
      (?! \A )           # Not beginning of string
      \G                 # G anchor - If matched before, start at end of last match
      (?! \) )           # Last was an escape, so ')' ends the block
      [^\]*             # Many non-escape's
      \K                 # Previous is not part of match
      \                 # A lone escape
   |                   # or,
                         # New Block Check - 
      \                 # A lone escape then,
      (?=                # One time Validation:
           \(                 #  an opening '('
           .*?                #  anything
           \ \)              #  then a final '\)'
      )                  # -------------
 )                  # Cluster end

基准:

样本\( \\\\\\\\\\\\\\\ \)

结果

New Regex:   (?s)(?:(?!\A)\G(?!\))[^\]*\K\|\(?=\(.*?\\)))
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   31
Elapsed Time:    1.25 s,   1253.92 ms,   1253924 µs


Old Regex:   (?s)(?:(?!\A)\G[^\]*\K\|\(?=\())(?=.*?(?<=\)\))
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   31
Elapsed Time:    3.95 s,   3952.31 ms,   3952307 µs