Powershell Regex:读取两点之间的多行字符串

Powershell Regex: Reading a multi-line string between two points

我有一个 powershell 正则表达式读取多行记录的常见问题。我已经阅读了询问类似问题的线程,但无法完全找到适用于我的案例的解决方案。

我的文件由可变长度的多行记录组成。我感兴趣的记录以 01 或 02 开头,后跟 V 或 M。只要另一条记录开始或找到以“50”开头的批记录,记录就会结束。每行的前三个字符标识记录。

即 01V(记录开始 - 内容如下) 01

我正在尝试通过识别开始和结束来使用正则表达式读取各个记录。

我现在所拥有的是基于这个答案: Match everything between two words in Powershell

#Read the file as a single string
$FilePath = "blaablaablaa"
$TestFile = get-content $FilePath | Out-String 

#( ?= Assert that this matches before the current position
# 0(1|2)(V|M) if the record is 01V or 01M or 02V or 02M 
# ) End assertion 
# .+? Match any number of characters, few as possible
# (?= Until a record starting with 70 is found  
# ) End of look ahead
$regex = [regex] '(?is)(?<=0(1|2)(V|M)).+?(?=70)'
echo $TestFile |  select-string -Pattern $regex 

如果我删除管道以使用外部字符串管道 returns 整个文件,则以上内容将适用于单行字符串。我猜我没有正确处理 /n 字符。

有什么建议吗?输入文件大致如下所示:

00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=11=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal

所需的输出是将文件拆分成单独的记录,这些记录由“50”或“90”或另一条记录的开头分隔。例如,这是最终记录:-

01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=12=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx

假设(根据您的描述)您还想匹配从 01M 到下一个 01M 的部分,然后再匹配那个直到 50 的部分。这可以解决问题:

(?gmis)^0[12][VM](?:[^\n]|\n(?!0[12][VM]|50|90))+

解释:匹配0, 1 or 2, Vor M后,(?:...)中的部分是这样的:

[^\n]|\n(?!0[12][VM]|50|90)

这意味着:

匹配任何不是换行的字符

未跟 (?!...)新记录开头的换行符 5090.

online Regex101 demo

使用您的测试数据:

@'
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=10=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
'@ | set-content testfile.txt


$Text = Get-Content ./testfile.txt -Raw

$regex = @'
(?ms)(01(?:M|V).+?)
(?:5|9)0.+?
'@


$Records = 
[regex]::Matches($Text,$regex) |
foreach {$_.groups[1].value}

$Records[-1]

01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=10=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx