Powershell Regex:读取两点之间的多行字符串
Powershell Regex: Reading a multi-line string between two points
我有一个 powershell 正则表达式读取多行记录的常见问题。我已经阅读了询问类似问题的线程,但无法完全找到适用于我的案例的解决方案。
我的文件由可变长度的多行记录组成。我感兴趣的记录以 01 或 02 开头,后跟 V 或 M。只要另一条记录开始或找到以“50”开头的批记录,记录就会结束。每行的前三个字符标识记录。
即
01V(记录开始 - 内容如下)
01
我正在尝试通过识别开始和结束来使用正则表达式读取各个记录。
我现在所拥有的是基于这个答案:
Match everything between two words in Powershell
#Read the file as a single string
$FilePath = "blaablaablaa"
$TestFile = get-content $FilePath | Out-String
#( ?= Assert that this matches before the current position
# 0(1|2)(V|M) if the record is 01V or 01M or 02V or 02M
# ) End assertion
# .+? Match any number of characters, few as possible
# (?= Until a record starting with 70 is found
# ) End of look ahead
$regex = [regex] '(?is)(?<=0(1|2)(V|M)).+?(?=70)'
echo $TestFile | select-string -Pattern $regex
如果我删除管道以使用外部字符串管道 returns 整个文件,则以上内容将适用于单行字符串。我猜我没有正确处理 /n 字符。
有什么建议吗?输入文件大致如下所示:
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01
01
01
01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01
01
01
01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=11=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
所需的输出是将文件拆分成单独的记录,这些记录由“50”或“90”或另一条记录的开头分隔。例如,这是最终记录:-
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=12=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
假设(根据您的描述)您还想匹配从 01M
到下一个 01M
的部分,然后再匹配那个直到 50
的部分。这可以解决问题:
(?gmis)^0[12][VM](?:[^\n]|\n(?!0[12][VM]|50|90))+
解释:匹配0, 1 or 2, Vor M后,(?:...)
中的部分是这样的:
[^\n]|\n(?!0[12][VM]|50|90)
这意味着:
匹配任何不是换行的字符
或
未跟 (?!...)
新记录开头的换行符或 50或90.
使用您的测试数据:
@'
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=10=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
'@ | set-content testfile.txt
$Text = Get-Content ./testfile.txt -Raw
$regex = @'
(?ms)(01(?:M|V).+?)
(?:5|9)0.+?
'@
$Records =
[regex]::Matches($Text,$regex) |
foreach {$_.groups[1].value}
$Records[-1]
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=10=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
我有一个 powershell 正则表达式读取多行记录的常见问题。我已经阅读了询问类似问题的线程,但无法完全找到适用于我的案例的解决方案。
我的文件由可变长度的多行记录组成。我感兴趣的记录以 01 或 02 开头,后跟 V 或 M。只要另一条记录开始或找到以“50”开头的批记录,记录就会结束。每行的前三个字符标识记录。
即 01V(记录开始 - 内容如下) 01
我正在尝试通过识别开始和结束来使用正则表达式读取各个记录。
我现在所拥有的是基于这个答案: Match everything between two words in Powershell
#Read the file as a single string
$FilePath = "blaablaablaa"
$TestFile = get-content $FilePath | Out-String
#( ?= Assert that this matches before the current position
# 0(1|2)(V|M) if the record is 01V or 01M or 02V or 02M
# ) End assertion
# .+? Match any number of characters, few as possible
# (?= Until a record starting with 70 is found
# ) End of look ahead
$regex = [regex] '(?is)(?<=0(1|2)(V|M)).+?(?=70)'
echo $TestFile | select-string -Pattern $regex
如果我删除管道以使用外部字符串管道 returns 整个文件,则以上内容将适用于单行字符串。我猜我没有正确处理 /n 字符。
有什么建议吗?输入文件大致如下所示:
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=11=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
所需的输出是将文件拆分成单独的记录,这些记录由“50”或“90”或另一条记录的开头分隔。例如,这是最终记录:-
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=12=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
假设(根据您的描述)您还想匹配从 01M
到下一个 01M
的部分,然后再匹配那个直到 50
的部分。这可以解决问题:
(?gmis)^0[12][VM](?:[^\n]|\n(?!0[12][VM]|50|90))+
解释:匹配0, 1 or 2, Vor M后,(?:...)
中的部分是这样的:
[^\n]|\n(?!0[12][VM]|50|90)
这意味着:
匹配任何不是换行的字符
或
未跟 (?!...)
新记录开头的换行符或 50或90.
使用您的测试数据:
@'
00 date
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=5xxxxxxxxxxxxxxxxxxxxxxxxxxx
01Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 01 01 01=0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01=9xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=10=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx
50 xxxxxxxxxxxx BatchTotal
90 xxxxxxxxxxxx FILETotal
'@ | set-content testfile.txt
$Text = Get-Content ./testfile.txt -Raw
$regex = @'
(?ms)(01(?:M|V).+?)
(?:5|9)0.+?
'@
$Records =
[regex]::Matches($Text,$regex) |
foreach {$_.groups[1].value}
$Records[-1]
01Vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$A xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01$B 0xxxxxxxxxxxxxxxxxxxx
01[=10=]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01xxxxxxxxxxxxxxxxxxxxxxxxxxx