使用 Select-String 匹配多个单行模式并写入输出

Using Select-String to Match Multiple Single-Line Patterns and Write to Output

我正在尝试构建一个简单的脚本来利用正则表达式并在一行中匹配多个模式 - 在整个输入文件中递归,并将结果写入输出文件。但是我碰壁了:

示例文本:

BMC12345 COMBINED PHASE STATISTICS:  31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345S', 0 ROWS SELECTED BUT DISCARDED DUE TBMC123456 COMBINED PHASE STATISTICS:  10 PHYSICAL (10 LOGICAL) RECORDS DISCARDED TO SYSDISC

这是我目前得到的:

$table = [regex] "'.*'"
$discard = [regex] "\d* PHYSICAL"

Select-String -Pattern ($table, $discard) -AllMatches .\test.txt | foreach {
    $_.Matches.Value
} > output.txt

输出:

'KDDT111D.DIH0345S'

期望输出:

'KDDT111D.DIH0345S' 10 Physical

出于某种原因,我无法将两种模式写入 output.txt。 理想情况下,一旦我开始工作,我想使用 Export-Csv 来获得更干净的东西,例如:

|KDDT111D|DIH0345S|10 Physical|

我认为您会发现 -match 运算符更适合此操作。 [grin] 对存储在 $InStuff 中的样本使用命名匹配,这...

$InStuff -match ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) \(.+"

... 给出以下一组匹配项...

Name                           Value                                                                              
----                           -----                                                                              
Space                          KDDT111D                                                                           
SubSpace                       DIH0345S                                                                           
Discarded                      10 PHYSICAL                                                                        
0                              BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345...

命名的匹配项可以通过 $Matches.<the capture group name> 寻址。

您 运行 进入 Select-String 限制[Microsoft.PowerShell.Commands.MatchInfo] 对象的 .Matches 属性 Select-String 为每个输入对象(行)发出的数据仅包含传递给
-Patternfirst 正则表达式的(可能是多个)匹配项参数.[1]

您可以解决这个问题,方法是传递一个单个正则表达式,通过交替组合输入正则表达式 (|):

Select-String -Pattern ($table, $discard -join '|') -AllMatches .\test.txt | 
  ForEach-Object { $_.Matches.Value } > output.txt

一个简化的例子:

# ('f.', '.z' -join '|') -> 'f.|.z'
'foo bar baz' | Select-String -AllMatches ('f.', '.z' -join '|') |
  ForEach-Object { $_.Matches.Value }

以上结果:

fo
az

证明 两个 正则表达式的匹配项都已报告。

警告 重新 输出排序:使用交替 (|) 会导致报告给定输入字符串的匹配项按照 在输入 中找到它们的顺序,而不是按照指定 正则表达式的顺序 .
也就是说,上面的 -Pattern 'f.|.z'-Pattern '.z|f.' 都会产生相同的输出顺序。


[1] 从 Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4 开始存在该问题,并在 this GitHub issue[=55= 中进行了讨论]

感谢贡献者的想法和学习经验。我能够利用收到的两个答案的组合获得所需的输出。

我发现 -match 运算符仅 returned 了源文件中第一次出现的正则表达式模式匹配,因此我需要添加一个 foreach 循环以便递归 return 匹配整个日志文件。

我还修改了正则表达式以仅包含大于 0 的丢弃值。

示例文本:

BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDAICH0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  3499604 ROWS SELECTED FOR SPACE 'KDDT000D.KDAIND0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1 ROWS SELECTED FOR SPACE 'KDDT000D.KDCISR0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  9185775 ROWS SELECTED FOR SPACE 'KDDT000D.KDIADR0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  11 PHYSICAL (11 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDICHT0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  2387375 ROWS SELECTED FOR SPACE 'KDDT000D.KDICMS0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1632821 ROWS SELECTED FOR SPACE 'KDDT000D.KDIPRV0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDLADD0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  24845 PHYSICAL (24845 LOGICAL) RECORDS DISCARDED TO SYSDISC

示例:

  $regex = ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) .[1-9][0-9]*\s\b"

    $timestamp = Get-Date
    $timestamp = Get-Date $timestamp -f "MM_dd_yy"
    $dir = "C:\Users\JonMonJovi\"

    cat $dir\*.log.txt | where {
        $_ -match $regex
    } | foreach {
        $Matches.Space, $Matches.SubSpace, $Matches.Discarded -join "|"
    } > C:\Users\JonMonJovi\Discarded\Discard_Log_$timestamp.txt

输出:

KDDT000D|KDIADR0S| 11 PHYSICAL
KDDT000D|KDLADD0S| 24845 PHYSICAL

从这里我可以使用竖线分隔的 .txt 输出文件导入到 Excel,满足我的要求。