为什么 cmd findstr 中的这个正则表达式有效？

Question

我需要创建一个 cmd 脚本（不知何故我做到了）从一系列文件中提取一些文本行并将它们放入一个新的 txt 文件中。

源文件是这样的：

%
!
! AAA
!
! ------------------------ SOME TEXT ABCDEFGHIJKLMN --------------------------
!
! BBB
! ----------------------------------------------------------------------------
! T5 PUNTA ø 6.5/9.5~  $ 63~
! ----------------------------------------------------------------------------
! T12 PUNTA ø 2.5~  $ 39~
! ----------------------------------------------------------------------------
! 
! SOME OTHER TEXT
! 
!  1]  ABC
!  2]  DEF
!  3]  ...

OTHER LINE 1
OTHER LINE 2
ETC

%

我需要提取的行是两个 "! ----------------------------------------------------------------------------" 之间的行，所以在这种情况下，T5 PUNTA ø 6.5/9.5~ $ 63~ 和 T12 PUNTA ø 2.5~ $ 39~.

我正在尝试一些带有 findstr 的正则表达式来匹配仅在相关行之后带有 ! 的行，这表明搜索结束，直到我出现（纯属偶然）一条指令匹配我需要的所有行（我猜是运气）。

片段是这样的：

@echo off
setlocal enabledelayedexpansion
if exist output.txt ( break > output.txt )
for /r <path> %%g in (<filename>) do (
    ...
    for /f "tokens=* delims= " %%a in (%%g) do (
        echo %%a | findstr /r /c:^\!$ >nul
        if errorlevel 1 (...)
        ) else ( echo %%a >> srcoutput.txt
            ...
        )
    )
)

请关注指令echo %%a | findstr /r /c:^\!$ >nul。这，出于我不知道的原因，只匹配 T5 PUNTA ø 6.5/9.5~ $ 63~ 和 T12 PUNTA ø 2.5~ $ 39~ 行。这正是我想要的，但我不知道为什么会这样！

谁能帮我理解为什么这个简单的表达式 ^\!$ 有效？在我的（错误的）理解中，它应该只匹配一行在开头和结尾有一个 ! （我已经逃脱了，因为否则它不起作用）。

提前致谢

Answer 1

实际命令行：

echo %%a | findstr /r /c:^\!$ >nul

只有 returns 行包含 $ 个字符。

这是一步一步发生的事情：

命令行被解析为（假设 %%a 保持 <expanded text>）：
```
  echo <expanded text> | findstr /r /c:\!$ >nul
```
所以（未加引号的）插入符 (^) 消失了，因为它是 cmd 的转义字符；由于 \ 没有特殊含义，您可以省略 ^ 毕竟；
由于启用了延迟扩展（实际上是不必要的），!-标志消失了，因为只有一个，所以命令行变成：
```
  echo <expanded text> | findstr /r /c:$ >nul
```
\-符号充当转义字符（尽管特别是 findstr！），因此 $-符号在常规中失去其特殊含义表达式 (/R) 模式（即将匹配锚定到行尾）因此被视为文字字符；
管道的左侧传递文本 <expanded text> （尾随 SPACE 因为前面有一个|)，右侧最终会在该文本中搜索文字 $ 个字符；

您可以使用以下命令行获得完全相同的结果：

echo %%a | findstr /C:$ > nul

虽然我宁愿把它写成：

echo(%%a| findstr /C:"$" > nul

避免尾随 SPACE 并安全地回显任何文本。

对于这项任务，我可能会采用另一种方法（查看所有解释性 rem 备注）：

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_ROOT=D:\Target\Path"        & rem // (path to root directory)
set "_MASK=*.txt"                 & rem // (name or mask of files to process)
set "_SAVE=D:\Path\To\output.txt" & rem // (location of output file)
rem // Gather line-feed character:
(set ^"_LF=^
%= blank line =%
^")
rem // Gather carriage-return character:
for /F %%C in ('copy /Z "%~f0" nul') do set "_CR=%%C"

rem // Open output file only once and write to it:
> "%_SAVE%" (
    rem // Find matching files and loop through them:
    for /R "%_ROOT%" %%F in ("%_MASK%") do (
        rem // Check for file existence (only necessary when a dedicated name is given):
        if exist "%%~F" (
            rem // Store path of current file:
            set "FILE=%%~F"
            rem // Toggle delayed expansion to avoid troubles with `!`:
            setlocal EnableDelayedExpansion
            rem // Remove remaining quotes (only necessary when a dedicated name is given):
            set "FILE=!FILE:"=!
            rem /* Do a multi-line search by `findstr`, which only returns the first line;
            rem    the searched string is:
            rem     # anchored to the beginning of a line,
            rem     # an `!`, a space and a `T`, then
            rem     # some arbitrary text (without line-breaks), then
            rem     # a line-break, then another `!` and a space, then
            rem     # a sequence of one or more `-`,
            rem     # anchored to the end of a line;
            rem    only the portion before the explicit line-break is then returned: */
            findstr /R /C:"^^^! T.*~!_CR!!_LF!^! --*$" "!FILE!"
            endlocal
        )
    )
)

endlocal
exit /B

这并不完全搜索 ! --- 等之间的行，而是搜索第一行以 ! + SPACE 开头的两个相邻行+T以~结尾，第二个由!+SPACE+一个序列组成一个或多个 -.

如果输入文件包含 Unix-/Linux-style 换行符而不是 DOS-/Windows-style 换行符，请将脚本中 findstr 搜索字符串中的 !_CR!!_LF! 替换为 !_LF!.

Answer 2

我已决定 post 这是实现您的既定目标的潜在方法。它使用与当前接受的答案不同的方法，其想法是检索 ! ----etc. 行号，然后确定它们中任意两个之间的行是否具有所需的内容。 这意味着它不希望匹配这些行之间的特定内容，因此应该可以工作，无论您的字符串使用哪个字符构成。

@Echo Off
SetLocal EnableExtensions
Set "InFile=somefile.ext"
Set "OutFile=someoutfile.ext"
Set "$#="&For /F "Delims=:" %%G In (
    '"%__AppDir__%findstr.exe /RNC:"^! --*$" "%InFile%""')Do (
    Set /A _2=%%G-2&Call Set "$#= %%G %%$#%%"&Call Set "= %%_2%% %%%%")
If Not Defined $# Echo No Matches&%__AppDir__%timeout.exe -3&Exit /B
SetLocal EnableDelayedExpansion
For %%G In (%%)Do If "!$#: %%G =!"=="%$#%" Set "=!: %%G =!"
For %%G In (%%)Do Set /A _1=%%G+1&Set "= !_1! !!"
EndLocal&(For %%G In (%%)Do For /F "Tokens=1*Delims=]" %%H In (
    '%__AppDir__%find.exe /V /N "" "%InFile%"^
     ^|%__AppDir__%findstr.exe "^\[%%G\]"')Do Echo %%I)>"%OutFile%"
GoTo :EOF

只需根据需要更改第 3 和 4 行的输入文件和输出文件名。

请注意，我无法对此进行测试，因此它可能无法正常工作，或者可能以错误的方式工作。在实际使用之前，请在各种类似格式的文件上进行测试！

为什么 cmd findstr 中的这个正则表达式有效？

Why does this regular expression in cmd findstr work?

regex

cmd

batch-file

findstr