用于 mathjax 语法的 Perl 正则表达式
Perl Regex for mathjax syntax
我在制作 perl 正则表达式以按照以下规则更改 \
字符时遇到问题:
- 匹配序列应以
\(
开头
- 它应该以
\)
结尾
- 前一个匹配序列中的任何
\
字符应替换为双反斜杠 \
示例文本参考:
Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\)
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .
应该变成:
Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\)
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .
到目前为止,这是我最好的选择:
s/(\\()(.*)(\)(.*)(\\))/\\\(\\\\\)/mg
产生:
Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\)
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .
如你所见
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
错了。
如何修改我的正则表达式以满足我的需要?
我测试了@sln 正则表达式
s/(?x)(?:(?!\A)\G[^\]*\K\|\(?=\())(?=.*?(?<=\)\))/\\/g;
它似乎有效,尽管它对我来说仍然是一个神秘的谜。
更新说明
(?s) # Inline Dot-All modifier
(?: # Cluster start
(?! \A ) # Not beginning of string
\G # G anchor - If matched before, start at end of last match
[^\]* # Many non-escape's
\K # Previous is not part of match
\ # A lone escape
| # or,
# Start of an opening '\('
\ # A lone escape
(?= \( ) # followed by an open parenth
) # Cluster end
(?= # Lookahead, each match validates a final '\)'
.*?
(?<= \ )
\)
)
发布我原来的更新后的正则表达式。
原文在结尾处对 所有 转义进行了验证。
看了下,只做验证就可以加速了
有一次它找到了开始的方块。
底部是比较两种方法的基准。
更新的正则表达式:
$str =~ s/(?s)(?:(?!\A)\G(?!\))[^\]*\K\|\(?=\(.*?\\)))/\\/g;
(?s) # Dot-All modifier
(?: # Cluster start
(?! \A ) # Not beginning of string
\G # G anchor - If matched before, start at end of last match
(?! \) ) # Last was an escape, so ')' ends the block
[^\]* # Many non-escape's
\K # Previous is not part of match
\ # A lone escape
| # or,
# New Block Check -
\ # A lone escape then,
(?= # One time Validation:
\( # an opening '('
.*? # anything
\ \) # then a final '\)'
) # -------------
) # Cluster end
基准:
样本\( \\\\\\\\\\\\\\\ \)
结果
New Regex: (?s)(?:(?!\A)\G(?!\))[^\]*\K\|\(?=\(.*?\\)))
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 31
Elapsed Time: 1.25 s, 1253.92 ms, 1253924 µs
Old Regex: (?s)(?:(?!\A)\G[^\]*\K\|\(?=\())(?=.*?(?<=\)\))
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 31
Elapsed Time: 3.95 s, 3952.31 ms, 3952307 µs
我在制作 perl 正则表达式以按照以下规则更改 \
字符时遇到问题:
- 匹配序列应以
\(
开头
- 它应该以
\)
结尾
- 前一个匹配序列中的任何
\
字符应替换为双反斜杠\
示例文本参考:
Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\)
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .
应该变成:
Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\)
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .
到目前为止,这是我最好的选择:
s/(\\()(.*)(\)(.*)(\\))/\\\(\\\\\)/mg
产生:
Se la \probabilità dell'evento\ A è \(\frac{3}{4} \) e la
probabilità dell'evento B è \(\frac{1}{4}\)
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
\(\frac{3}{16}\) .
\(\frac{1}{2}\) .
如你所见
\(\frac{3}{4} +\frac{3}{4}\) .
\(\frac{1}{4} - \frac{3}{4}\) .
错了。
如何修改我的正则表达式以满足我的需要?
我测试了@sln 正则表达式
s/(?x)(?:(?!\A)\G[^\]*\K\|\(?=\())(?=.*?(?<=\)\))/\\/g;
它似乎有效,尽管它对我来说仍然是一个神秘的谜。
更新说明
(?s) # Inline Dot-All modifier
(?: # Cluster start
(?! \A ) # Not beginning of string
\G # G anchor - If matched before, start at end of last match
[^\]* # Many non-escape's
\K # Previous is not part of match
\ # A lone escape
| # or,
# Start of an opening '\('
\ # A lone escape
(?= \( ) # followed by an open parenth
) # Cluster end
(?= # Lookahead, each match validates a final '\)'
.*?
(?<= \ )
\)
)
发布我原来的更新后的正则表达式。
原文在结尾处对 所有 转义进行了验证。
看了下,只做验证就可以加速了
有一次它找到了开始的方块。
底部是比较两种方法的基准。
更新的正则表达式:
$str =~ s/(?s)(?:(?!\A)\G(?!\))[^\]*\K\|\(?=\(.*?\\)))/\\/g;
(?s) # Dot-All modifier
(?: # Cluster start
(?! \A ) # Not beginning of string
\G # G anchor - If matched before, start at end of last match
(?! \) ) # Last was an escape, so ')' ends the block
[^\]* # Many non-escape's
\K # Previous is not part of match
\ # A lone escape
| # or,
# New Block Check -
\ # A lone escape then,
(?= # One time Validation:
\( # an opening '('
.*? # anything
\ \) # then a final '\)'
) # -------------
) # Cluster end
基准:
样本\( \\\\\\\\\\\\\\\ \)
结果
New Regex: (?s)(?:(?!\A)\G(?!\))[^\]*\K\|\(?=\(.*?\\)))
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 31
Elapsed Time: 1.25 s, 1253.92 ms, 1253924 µs
Old Regex: (?s)(?:(?!\A)\G[^\]*\K\|\(?=\())(?=.*?(?<=\)\))
Options: < none >
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 31
Elapsed Time: 3.95 s, 3952.31 ms, 3952307 µs