将 HTML 与 <DIV> class 解析为变量

Parsing HTML with <DIV> class to variable

我正在尝试解析没有任何 class 名称的服务器监控页面。 HTML 文件看起来像这样

<div style="float:left;margin-right:50px"><div>Server:VIP Owner</div><div>Server Role:ACTIVE</div><div>Server State:AVAILABLE</div><div>Network State:GY</div>

如何将此 html 内容解析为

之类的变量
$Server VIP Owner
$Server_Role Active
$Server_State Available

因为没有 class 名称..我正在努力提取它。

 $htmlcontent.ParsedHtml.getElementsByTagName('div') | ForEach-Object {
>>     New-Variable -Name $_.className -Value $_.textContent

虽然您只向我们展示了 HTML 的一小部分,但很可能还有更多 <div> 标签。

如果没有 id 属性 或任何其他唯一标识您要查找的 div 的内容,您可以使用 Where-Object 子句来查找您要查找的部分为.

尝试

$div = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }).outerText

# if you're on PowerShell version < 7.1, you need to replace the (first) colons into equal signs
$result = $div -replace '(?<!:.*):', '=' | ConvertFrom-StringData

# for PowerShell 7.1, you can use the `-Delimiter` parameter
#$result = $div | ConvertFrom-StringData -Delimiter ':'

结果是这样的哈希表:

Name                           Value
----                           -----
Server Name                    VIP Owner
Server State                   AVAILABLE
Server Role                    ACTIVE
Network State                  GY

当然,如果报告中有更多这样的内容,您将不得不使用类似这样的内容遍历 divs:

$result = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }) | Foreach-Object {
    $_.outerText -replace '(?<!:.*):', '=' | ConvertFrom-StringData
}

好的,所以原来的问题没有说明我们在处理什么..
显然,您的 HTML 包含这样的 div:

  <div>=======================================</div>
  <div>Service Name:MysqlReplica</div>
  <div>Service Status:RUNNING</div>
  <div>Remarks:Change role completed in 1 ms</div>
  <div>=======================================</div>
  <div>Service Name:OCCAS</div>
  <div>Service Status:RUNNING</div>
  <div>Remarks:Change role completed in 30280 ms</div>

要处理这样的块,您需要一种完全不同的方法:

# create a List object to store the results
$result  = [System.Collections.Generic.List[object]]::new()
# create a temporary ordered dictionary to build the resulting items
$svcHash = [ordered]@{}

foreach ($div in $htmlcontent.ParsedHtml.getElementsByTagName('div')) {
    switch -Regex ($div.InnerText) {
        '^=+' { 
            if ($svcHash.Count) {
                # add the completed object to the list
                $result.Add([PsCustomObject]$svcHash)
                $svcHash = [ordered]@{}
            }
        }
        '^(Service .+|Remarks):' { 
            # split into the property Name and its value
            $name, $value = ($_ -split ':',2).Trim() 
            $svcHash[$name] = $value 
        }
    }
}
if ($svcHash.Count) {
    # if we have a final service block filled. This happens when no closing
    #   <div>=======================================</div>
    # was found in the HTML, we need to add that to our final array of PSObjects
    $result.Add([PsCustomObject]$svcHash)
}

# output on screen
$result | Format-Table -AutoSize

# output to CSV file
$result | Export-Csv -Path 'X:\services.csv' -NoTypeInformation

使用上面的例子在屏幕上输出:

Service Name Service Status Remarks                          
------------ -------------- -------                          
MysqlReplica RUNNING        Change role completed in 1 ms    
OCCAS        RUNNING        Change role completed in 30280 ms