如何在使用 Regex 和 PowerShell 拆分大文本文件期间在关键字行上方包含一行

Question

我们有一个很大的 txt 文件（“C:\temp\longmessages.txt”），如下所示：

美洲

这是开始

一些文字 1

一些文字 2

一些文字 3

等等等

结束

欧洲

这是开始

一些文字 4

一些文字 5

一些文字 6

一些文字 7

等等等

结束

亚洲

这是开始

一些文字 8

一些文字 9

一些文字 10

等等等

结束

通过使用下面的 PS 脚本，我可以 SPLIT "C:\temp\longmessages.txt" 分成 更小的 1.txt、2.txt、3.txt 等。每个更小的 .txt 文件 从第一个“开始”拆分为下一个“开始” 然而，每个较小的文件都从“开始”开始，并在 “这是开始” 上方留下一行，而 我们想包括一个每个较小的拆分文件顶部“开始”上方的行表示美洲、欧洲等。需要添加到“开始”上方的每个文件

$InputFile = "C:\temp\longmessages.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
While (($Line = $Reader.ReadLine()) -ne $null) {
    **If ($Line -match "START")** {
  
       $OutputFile = "C:\temp\output$a.txt"
       $filename
  if ($filename -eq $null){
  
  $OutputFile = $filename
  }
       
        $a++
    }
     
     
    Add-Content $OutputFile $Line
  
}

Answer 1

继续我的评论，我认为在 End.

行上进行拆分会容易得多

尝试

$path  = 'C:\temp\longmessages.txt'
# create a List object to add lines to
$lines = [System.Collections.Generic.List[string]]::new()
$count = 1

# use 'switch' to parse the log file line-by-line
switch -Regex -File $path {
    '^End$' { 
        # add 'End' to the list
        $lines.Add($_)
        # if the top line is empty or whitespace only, remove that line
        if ([string]::IsNullOrWhiteSpace($lines[0])) { $lines.RemoveAt(0) }
        # create the full name of the output file and increment the file counter
        $OutputFile = 'C:\temp\output\{0}.txt' -f $count++
        # write the file
        $lines | Set-Content -Path $OutputFile -Force
        # clear the list for the next file
        $lines.Clear()
    }
    default { $lines.Add($_) }
}

使用您的示例会生成三个文件：

1.txt

Americas

This is Start

some text 1

some text 2

some text 3

etc. etc

End

2.txt

Europe

This is Start

some text 4

some text 5

some text 6

some text 7

etc. etc

End

3.txt

Asia

This is Start

some text 8

some text 9

some text 10

etc. etc

End

如何在使用 Regex 和 PowerShell 拆分大文本文件期间在关键字行上方包含一行

How to include one line above of the keyword line during SPLITING a big txt file using Regex and PowerShell

regex

powershell

powershell-2.0

powershell-3.0

powershell-4.0