Powershell3：识别并显示 ascii 文件中的最后 n 行

Question

我想这应该很简单。我将 xcopy 的日志记录输出写入一个纯文本文件，带有每日分隔符（字面意思） "++++++++++++++++++++Tue 07/03/2018 0900 PM" 在每次每日备份之前附加到日志文件。所以文件中的最后几行通常如下所示：

新的一天追加新的分隔线等等。

我想显示最后一个定界符及其后面的行到 eof。

我试过的模式GET-Content, Select-String -Context 0,20不工作，

PS 说我的搜索字符串 ++++++++++++++++++++ 不是正则表达式, 无法识别路径等。有什么帮助吗？

内存和时间没有问题。对不起，如果这太简单了。

Answer 1

TLDR；在搜索中转义 +，使用“\+\+\+”等

背景

不幸的是+是正则表达式世界中的保留字符。

What is the meaning of + in a regex?

它告诉引擎一次或多次匹配先前的搜索运算符（字符、范围或表示一组字符的代码，如 \d - 数字）。您可以通过运行以下内容在 Powershell 中查看有关此错误的更多信息：

[regex]$x = "++++"

Returns:

Cannot convert value "++++" to type "System.Text.RegularExpressions.Regex". Error: "parsing "++++" - Quantifier {x,y} following nothing."
At line:1 char:1
+ [regex]$x = "++++"
+ ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : MetadataError: (:) [], ArgumentTransformationMetadataException
    + FullyQualifiedErrorId : RuntimeException

表示量词 (+) 没有跟随。

所以我们需要使用 \:

转义 +

[regex]$x = "\+\+\+\+"

$x.Match('++++')

返回以下内容，一个无错误的匹配：

Groups   : {0}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 4
Value    : ++++

改进

如果你知道有多少个+，你可以在"\+{20}"上匹配，如果有20个。或者从前面的例子：

[regex]$x = "\+{4}"

$x.Match('++++')

Answer 2

msjqu's helpful answer 解释了转义 + 个字符的必要性。作为 \+ 在正则表达式中为了这些字符。被视为文字 .

因此，匹配 header 行的正则表达式 - 20 + 个字符。在一行的开头 (^) - 是：^\+{20}

也就是说，如果通过 20 个 + 符号检测 header 行就足够了，Get-Content -Delimiter - 仅支持文字作为分隔符 - 提供了一个简单有效的解决方案（PSv3+；假定输入文件some.log 在当前目录./ 中）：

 $headerPrefix = '+' * 20  # -> '++++++++++++++++++++'
 $headerPrefix + (Get-Content ./some.log -Delimiter $headerPrefix -Tail 1)

-Delimiter 使用指定的 header-line 签名将文件分成 "lines"（分隔符实例之间的文本，即块此处的行数）和 -Tail 1 returns 最后一个 "line" （块）通过从文件的 end 搜索它。 ^{感谢 mjsqu 帮助我得出这个解决方案。}

以下备选解决方案是regular-expression-based，可实现更复杂的header-line匹配。

注意：虽然下面的 none 解决方案需要将日志文件作为一个整体 读入内存 ，但它们会 通读整个文件，不只是从结束.

我们可以在 switch -regex -file 语句中使用它 来处理日志文件的所有行，以便收集以 [=68 开头和之后的行=]last ^\+{20}匹配；该代码假定输入文件路径 ./some.log:

# Process all lines in the log file and 
# collect each block's lines along the way in 
# array $lastBlockLines, which means that after 
# all lines have been processed, $lastBlockLines contains
# the *last* block's lines.
switch -regex -file ./some.log {
  '^\+{20}' { $lastBlockLines = @($_) } # start of new block, (re)initialize array
  default   { $lastBlockLines += $_ }   # add line to block
}

# Output the last block's lines.
$lastBlockLines

或者，如果您愿意假设一个块中有固定的最大行数， single-pipeline 使用 Select-String 的解决方案是可能的：

Select-String '^\+{20}' ./some.log -Context 0,100 | Select-Object -Last 1 | 
  ForEach-Object { $_.Line; $_.Context.PostContext }

Select-String '^\+{20}' ./some.log -Context 0,100 匹配文件 ./some.log 中的所有 header 行，并且感谢 -Context 0, 100，包括（最多）100 行发出的匹配项 object 中的匹配行（0 表示不包含在匹配行之前的行。
Select-Object -Last 1 仅通过 最后一个 匹配。
ForEach-Object { $_.Line; $_.Context.PostContext } 然后输出最后一个匹配项的匹配行以及后面最多 100 行。

如果不介意读文件两次，可以将Select-String和Get-Content ... | Select-Object -Skip合并:

Get-Content ./some.log | Select-Object -Skip (
    (Select-String '^\+{20}' ./some.log | Select-Object -Last 1).LineNumber - 1
  )

这利用了以下事实，即 Select-String 发出的匹配 object 具有 .LineNumber 属性反映给定匹配所在的行号被找到。将最后一个匹配项的行号减 1 传递给 Get-Content ... | Select-Object -Skip，然后输出匹配行以及所有后续行。

Answer 3

就个人而言，我会更改该日志记录格式，使其对对象更加友好并正常使用。

但是，根据您发布的内容。这是一种解决方法，我相信还有更优雅的方法，但这是 q&d（又快又脏）。此外，作为一名军事兽医（20 年以上）并且仍然在军事时间生活和工作，0900 是 9:00 AM，而 2100 是 9:00 PM。 8^} ...只是说...

# Get the lines in the file
($DataSet = Get-Content -Path '.\LogFile.txt')

# Results

++++++++++++++++++++Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM
++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM



 # Get the index of the LastDateEntry, using a string match (RegEx)
($LastDateEntry = (Get-Content -Path '.\LogFile.txt' | %{$_ | Select-String -Pattern '[+].*'}) | Select -Last 1)

# Results

++++++++++++++++++++Mon 07/03/2018 0900 PM


# Get the LastDateEntryIndex
($DateIndex = (Get-Content -Path '.\LogFile.txt').IndexOf($LastDateEntry))

# Results

5



 # Get the data using the index
ForEach($Line in $DataSet)
{
    If ($Line.ReadCount -ge $DateIndex)
    {
    Get-Content -Path '.\LogFile.txt' | Select-Object -Index ($Line.ReadCount)
    }
}

# Results

++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM

Answer 4

另一种使用 RegEx 将文件分成多个部分的方法。

将 Get-Content 与 -Raw 参数一起使用以获得一个字符串，而不是字符串数组
使用非消耗性 positive lookahead 将文件拆分为以
开头的部分 20*+ -split '(?=\+{20})' 不为空 -ne ''
使用索引 [-1] 获取最后一部分。

示例输出

PS> ((Get-Content '.\LogFile.txt' -raw) -split '(?=\+{20})' -ne '')[-1]
++++++++++++++++++++Mon 07/03/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups Mon 07/02/2018 0900 PM
0 Files(s) copied
 Xcopy SUCCEEDED K:\ to J:\MyUSBBackups\OutlookBak Mon 07/02/2018 0900 PM

Powershell3：识别并显示 ascii 文件中的最后 n 行

Powershell3: discern and display last n Lines from an ascii file

regex

powershell

parsing

file-get-contents

select-string

背景

改进