为什么我的正则表达式也抓取 EOLN？

Question

我正在尝试编写一个批处理文件来自动批量编辑某些 Pascal 源代码。我的源文件偶尔会有这样一行：

     //{## identifier} Inc (Index) ; { a comment }    // another comment

我想将它们全部更改为：

     {$ifdef identifier} Inc (Index) ; { a comment }    // another comment {$endif}

下面是我正在使用的测试批处理文件。

:: File TestRXRepl.bat
:: ===================     

@echo     //{##   identifier} Inc (Index) ; { a comment }    // another comment >t.pas
@set "FindRegExp=(\ *)\/\/\{\#\#\ *([a-z,0-9,_]+)\}(\ *)(.*)"
@set "ReplRegExp={$ifdef } {$endif}"

rxrepl --file t.pas --output t.out --search "%FindRegExp%" --replace "%ReplRegExp%"
@type t.pas
@type t.out

正则表达式应该是：

捕获前导缩进（第 1 组）
匹配//{##
跳过任何 spaces
捕获标识符（第 2 组）
匹配}
捕获源代码缩进（第 3 组）
捕获源代码行从那时起到行尾（第4组）

除行尾处理外一切正常。第 4 组应该捕获从源行开始到行尾的所有内容，但它似乎 include 行尾，结果是 {endif}被写入下一行，即我得到：

{$ifdef identifier} Inc (Index) ; { a comment }    // another comment
{$endif}

而不是：

{$ifdef identifier} Inc (Index) ; { a comment }    // another comment {$endif}

我使用的工具是RXRepl。它有一个选项 --eol，听起来可能很有用，但我无法通过使用它来改变行为。

（备注）

我知道这两个结果在句法上都是正确的，但事实并非如此点 ;-)
第 3 组和第 4 组可以合并。
它不处理其他白色 space 字符。
我知道有更经典的方法来匹配标识符。

欢迎提出让它更优雅的建议，以及让它正常工作的建议。

Answer 1

问题似乎是您的 . 正在匹配换行符，这意味着 PCRE2_DOTALL 选项有效。（我不知道为什么会这样，可能 rxrepl 总是默认设置该选项。）

一种可能的解决方法是在正则表达式匹配中以 (.*\S) 结束第 4 组，使用 \S character type 匹配任何不是空格的字符，并将排除换行符。

但可能解决此问题的最佳方法是使用 the \N sequence，手册中将其描述为：

The \N escape sequence has the same meaning as the "." metacharacter when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the meaning of \N.

所以只要在你的匹配中对第 4 组使用 (\N*) 就会匹配它当前匹配的所有内容，除了尾随的换行符。

在您的脚本中，只需更新此行：

@set "FindRegExp=(\ *)\/\/\{\#\#\ *([a-z,0-9,_]+)\}(\ *)(\N*)"

为什么我的正则表达式也抓取 EOLN？

Why does my regular expression grab the EOLN as well?

regex

pcre