Select-String 从模式 A 到模式 B

Question

有没有办法使用 Select-String 找到 X 和 Y 之间的所有行。

例如如果我有一个包含以下内容的文件：

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

我想找到 < function >management< /function > 的所有内容，所以我最终会得到：

<employee>
    <Name>Bob Smith</Name>
    <function>management</function>
    <age>39</age>
    <birthday>3rd June</birthday>
    <car>yes</car>
</employee>
<employee>
    <Name>Mark Perkins</Name>
    <function>management</function>
    <age>32</age>
</employee>

如果所有分组的大小都相同，我可以使用类似的东西：

Select-String -Pattern '<function>management</function>' -CaseSensitive -Context 2,2

然而，实际上它们的大小不会相同，所以我不能每次都使用固定的数字。

我真的需要一种方式来表达 return 一切：

2 rows above my search term
until
the following '</employee>' field

对于所有匹配的实例。

这可能吗？

我无法在 powershell 中使用标准的 xml 工具，因为我正在阅读的文件不是标准的 xml 因此我将 [line 157: Time 2015-08-04 11:34:00] 作为示例。最好的理解方式是将很多 xml 文件全部合并到一个 xml 文件中，用 [line . . .] headers 将它们分开。

附加信息：我担心我的示例有点过于简单，实际文件更像是：

[line 157: Time 2015-08-04 11:34:00]
<?xml version="1.0" encoding="utf-8"?>
<other>
    <stuff>
    . . .
    </stuff>
</other>

<?xml version="1.0" encoding="utf-8"?>
<staff>
    <employee>
    ...
    </employee>
</staff> 

<staff>
    <employee>
    ...
    </employee>
</staff>
[line End: Time 2015-08-04 11:34:00]

附加信息 我添加了代码以忽略 < ?xml version. . . 行。我还尝试添加自己的根元素：

$first = "<open>"
$last = "</open>"
$a = 0

. . .

if($a -eq 0)
    {
        $XmlFiles[$Index] += $first
        $a++
    } 

. . .

$XmlFiles[$Index] += $last

但这会产生 Array assignment failed because index '-1' was out of range. 错误

附加信息 最终结果是这样的：

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

$first = "<open>"
$last = "</open>"

# Go through the file and store the individual xml documents in a string array
$a=0
Get-Content $FilePath | `
%{
    if($_ -match "^\[line\ \d+")
        {
            if($a -eq 0)
                {
                    #if this is the top line, ignore it
                }
            else
                {
                    #if this is a boundary, add a closing < /open > tag
                    $XmlFiles[$Index] += $last
                }
            # We've got a boundary, move to next index in array
            $Index++
            # Add a new string to hold the next xml document
            $XmlFiles += ""
            # Add an < open > tag
            $XmlFiles[$Index] += $first
            $a++
        } 
    elseif ($_ -match '^\<\?xml') #ignore xml headers
        {
            # End of Section, or XML Header. Do Nothing and move on
        }
    elseif([string]::IsNullOrEmpty($_))
        {
            # Blank Line, Do Nothing and move on
        }
    else 
        {
            # Add each line to the string (xml doesn't care about line breaks)
            $XmlFiles[$Index] += $_
        }
}

# add the final < /open > tag
$XmlFiles[$Index] += $last

$a=0
$Results = foreach($File in $XmlFiles)
{
    $Xml = [xml]($File.Trim())
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
    $a++
}

$Results

它基本上忽略了标题 [line. . .、xml 定义 < ?xml 和任何空行，并在每个部分周围添加了一个 < open >. . . < /open > 标记以使其成为有效。

Answer 1

我认为您高估了将单个 Xml 文档解析为实际 XML 的挑战。您可以逐行阅读文件，并使用“[line ...]”字符串作为各个文档之间的边界：

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

# Go through the file and store the individual xml documents in a string array
Get-Content $FilePath |%{
    if($_ -match "^\[line\ \d+"){
        # We've got a boundary, move to next index in array
        $Index++
        # Add a new string to hold the next xml document
        $XmlFiles += ""
    } else {
        # Add each line to the string (xml doesn't care about line breaks)
        $XmlFiles[$Index] += $_
    }
}

$Managers = foreach($File in $XmlFiles){
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
}

使用这样的示例文件（modified/extended 版本的示例）：

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>
[line 158: Time 2015-08-06 12:36:30] 
<staff>
    <employee>
        <Name>Rob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Cam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Stark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

结果 $Managers 将是：

PS C:\> $Managers|Select Name,function,age

Name                               function                          age
----                               --------                          ---
Bob Smith                          management                        39
Mark Perkins                       management                        32
Rob Smith                          management                        39
Stark Perkins                      management                        32

Select-String 从模式 A 到模式 B

Select-String from pattern A to pattern B

xml

powershell-2.0

select-string