正则表达式字符排除

Question

我正在尝试验证一个字符串，该字符串包含一个或多个格式为 [Item~SomeNameHere] 的标记，该标记的颠覆是 [Item~Incerement***]，其中 *** 是多种可能性稍后我会进一步验证。关键是增量令牌必须是字符串中的最终令牌。所以[Item~Increment]有效，_[Item~modifiedDate][Item~Increment(#)]有效但_[Item~Increment(#)][Item~modifiedDate]无效。目标是一个非常灵活的面包屑功能，然后这些标记将被其他数据替换为最终文件或文件夹名称。

为此，我从这个正则表达式 \[Item~Increment.*\] 开始，它确实在字符串中找到了标记。所以我修改了它以在字符串末尾使用 \[Item~Increment.*\]$ 查找令牌并且它有效，直到和示例像 [Item~Increment(#)][Item~modifiedDate] where .* matches (#)][Item~modifiedDate 并产生 True where需要一个 False。不知何故，我需要 . 任何字符，不包括 [ 或 ]，零次或多次。但是 \[Item~Increment[.-\[\]]*\]$ 没有完成工作，我现在已经超出了我的 RegEx 深度。我也试过用 \[Item~Increment[^\[\]]*\]$ 进行否定，但它也失败了，因为总是 false.

编辑：澄清一下，Increment 标记只能出现一次并且只能出现在最后。但是，字符串前面可能还有 [Item~???] 或 [???~???] 形式的其他标记，并且也可能有文字字符。所以 -[Some~String]_[Item~Date]_[Item~Increment] 是有效的。根据@wiktor-stribiżew 的回答，我对内容进行了一些改进，并转录为 PowerShell 命名法，现在我有了这个...

$breadcrumbs = @('none', '[Item~Increment]', '[Item~Increment(#)]', '[Item~modifiedDate][Item~Increment(#)]', '[Item~Increment][Item~modifiedDate]', '[Item~Increment][Item~Increment]')
$pattern = '(?:\[Item~[^][]*])*\[Item~Increment[^][]*]$'

CLS
foreach ($breadcrumb in $breadcrumbs) {
    Write-Host "$([regex]::matches($breadcrumb, $pattern).Count) $breadcrumb"
}

生产...

0 none
1 [Item~Increment]
1 [Item~Increment(#)]
1 [Item~modifiedDate][Item~Increment(#)]
0 [Item~Increment][Item~modifiedDate]
1 [Item~Increment][Item~Increment]

~~理论上第一个应该失败，因为没有Increment token~~理论上第一个应该通过，因为没有不正确的token，最后一个应该失败，因为有两个增量标记，倒数第二个应该失败，因为有一个增量标记不在字符串的末尾。

但是这个正则表达式在最终测试中失败了，我假设是因为 [Item~Increment][Item~Increment] 被匹配为

[Item~Increment
][Item~Increment
]

其中][Item~Increment为变量内容

这是我转录到PowerShell的错误吗？或者 RegEx 是否需要更多以确保 [ 和 ] 不会出现在令牌内，因此此示例将产生计数 2 或失败。我不介意需要对计数进行另一次测试，因为这对用户来说是一个有用的错误。但就目前而言，我得到的计数为 1，这是无效的。

EDIT2：Doj 的回答也很有趣，而且也更短。使用 \[Item~Increment[^\[\]]*\] 我在最后一个示例中得到 2 的计数，但未处理订单。像这样 \[Item~Increment[^\[\]]*\]$ 将其修改为 sting 的末尾并处理订单，但不再是倍数。呃

EDIT3：根据 Doj 的回答结合这两种模式让我得到这个

foreach ($breadcrumb in $breadcrumbs) {
    $incrementCount = ([regex]::matches($breadcrumb, '\[Item~Increment[^\[\]]*\]')).Count
    if ($incrementCount -eq 0) {
        Write-Host "$breadcrumb good"
    } elseif ($incrementCount -gt 1) {
        Write-Host 'Duplicate [Item~Increment] tokens'
    } else {
        if (([regex]::matches($breadcrumb, '\[Item~Increment[^\[\]]*\]$')).Count -ne 0) {
            Write-Host "$breadcrumb good"
        } else {
             Write-Host '[Item~Increment] token not at the end of the string'
        }
    }
}

产生...

none good
[Item~Increment] good
[Item~Increment(#)] good
[Item~modifiedDate][Item~Increment(#)] good
[Item~Increment] token not at the end of the string
Duplicate [Item~Increment] tokens

我要去参加比赛了！

EDIT4：因为学习两种做事的方法总是比只学习一种更好，所以我修改了 Wiktor 的 PS 方法，就像这样...

$breadcrumbs = @('none', '[Item~Increment]', '[Item~Increment%]', '[Item~modifiedDate][Item~Increment%]', '[Item~Increment][Item~modifiedDate]', '[Item~Increment][Item~Increment%]', '[Item~Increment]_')

CLS
foreach ($breadcrumb in $breadcrumbs) {
if ($breadcrumb -match '\A(?:\[Item~(?!Increment[^][]*])[^][]*])*\[Item~Increment[^][]*]\z') {
    Write-Host "$breadcrumb good"
} else {
    Write-Host "!!! $breadcrumb"
}

}

这会为 none 产生错误的结果。

！！！ none [项目~增量]好 [项目~增量%]好 [Item~modifiedDate][Item~Increment%] 好！！！ [项目~增量][项目~修改日期] ！！！ [项目~增量][项目~增量%] ！！！ [项目~增量]_

我认为那里有两个不正确的结果，但我使用了错误的模式。 :(

Answer 1

你可以使用

\A(?!.*\[Item~[^][]*])|\A(?:\[Item~(?!Increment[^][]*])[^][]*])*\[Item~Increment[^][]*]\z

参见 this regex demo。

详情:

\A(?!.*\[Item~[^][]*]) - 在字符串的开头，尽可能多地检查除 LF 字符之外的任何零个或多个字符之后是否有 [Item~...] 子字符串，如果找到，则匹配失败
| - 或
\A - 字符串开头
(?: - 非捕获组的开始（用于量化模式序列的容器）：
- \[Item~ - [Item~ 子串
- (?!Increment[^][]*]) - 如果有 Increment 字符串，然后是 [ 和 ] 以外的零个或多个字符，然后是 ] 当前位置右侧的字符
- [^][]* - [ 和 ]
- ] - 一个 ] 字符
)* - 重复模式序列零次或多次
\[Item~Increment - \[Item~Increment 字符串
[^][]*] - [ 和 ] 以外的零个或多个字符，然后是 ] 字符
\z - 字符串的末尾。

如果您不想使用那个大模式并且需要提供两个不同的错误消息，您可以将交替解包到单独的正则表达式检查中。

查看此 Powershell 演示：

$rx_1 = '\[Item~[^][]*]' # Item is in string check
$rx_2 = '\A(?:\[Item~[^][]*])*\[Item~Increment[^][]*]\z' # II must be at the end of string
$rx_3 = '\[Item~Increment[^][]*](?!\z)' # II not at the end of string
foreach ($breadcrumb in $breadcrumbs) {
    if ($breadcrumb -notmatch $rx_1) {  # If no Item is in string
        Write-Host "$breadcrumb good"   # It is valid
    } else {
        if ($breadcrumb -match $rx_2) {         # If the string only contains Items
            if ($breadcrumb -notmatch $rx_3) {  # ...and no II is found not at the end
                Write-Host "$breadcrumb good"   # it is good
            } else {
                Write-Host 'An [Item~Increment] token not at the end of the string!'
            }
        } else {
             Write-Host 'No [Item~Increment] token at the end of the string or invalid format!'
        }
    }
}

输出：

正则表达式字符排除

Regex character exclusion

regex

powershell

regex-negation