批处理中的正则表达式模式与子模式不匹配

Regex pattern in batch is not matching subpatterns

我正在尝试将值 ABC-2131ABC-345,DEF-3534 以及 EFG-456,FGF-4546,HJI-23423 与批处理脚本中的 RegEx 匹配

^([aA-zZ]*-[0-9]*)([,]*[aA-zZ]*-[0-9]*)*

Regex Sub 模式在批处理脚本中未正确匹配

例如在子模式分组中,^([aA-zZ]*-[0-9]*)仅对(ABC-234)有效,对ABC-234

无效

代码如下:

echo(%LogMsg%|findstr /r /c:"^([aA-zZ]*-[0-9]*)([,]*[aA-zZ]*-[0-9]*)*" >nul && (
echo FOUND
) || (
(echo NOT FOUND )
)

findstr command 仅支持非常小的正则表达式摘录。此外,搜索表达式的长度非常有限。

一种可能的方法是让for loop在逗号处拆分字符串,这构成了批处理中的标准标记分隔符,然后检查每个迭代项是否符合特定模式:

@echo off
rem // Assign sample string:
set "LogMsg=ABC-2131,ABC-345,DEF-3534,EFG-456,FGF-4546,HJI-23423"

rem /* Assign string to `for` meta-variable, just to protect potential
rem    special characters without using delayed variable expansion: */
for %%J in ("%LogMsg%") do (
    rem /* Loop through comma-separated items
    rem    (actually, any sequence consisting of SPACE, TAB, `,`, `;`,
    rem    `=`, VTAB, FF, NBSP is treated as a token separator): */
    for %%I in (%%~J) do (
        rem // Assign current item to variable:
        set "ITEM=%%I"
        rem // Match item against predefined pattern:
        cmd /V /C echo(!ITEM!| findstr /I "^[A-Z][A-Z][A-Z]-[0-9][0-9]*$" || goto :SKIP
    )
)
rem // This point is reached when all items match:
echo FOUND
exit /B
rem // This point is reached when any item does not match:
:SKIP
echo NOT FOUND

注意 findstr 有一些缺陷:

  • 一个upper-case字符class[A-Z]也匹配lower-case个字母(z除外),所以我决定做一个case-insensitive 搜索 (/I);
  • 字符class如[A-z]也可能匹配特殊字母如Åà等,这取决于当前的代码页;
  • 一个字符class像[0-9]也可能匹配一些特殊字符像2,3,取决于当前代码页;
  • 为了防止此类问题,您需要避免字符范围并指定每个可能的字符,例如 [0123456789];但请记住搜索字符串的长度有限;