将 HTML 与 <DIV> class 解析为变量
Parsing HTML with <DIV> class to variable
我正在尝试解析没有任何 class 名称的服务器监控页面。 HTML 文件看起来像这样
<div style="float:left;margin-right:50px"><div>Server:VIP Owner</div><div>Server Role:ACTIVE</div><div>Server State:AVAILABLE</div><div>Network State:GY</div>
如何将此 html 内容解析为
之类的变量
$Server VIP Owner
$Server_Role Active
$Server_State Available
因为没有 class 名称..我正在努力提取它。
$htmlcontent.ParsedHtml.getElementsByTagName('div') | ForEach-Object {
>> New-Variable -Name $_.className -Value $_.textContent
虽然您只向我们展示了 HTML 的一小部分,但很可能还有更多 <div>
标签。
如果没有 id
属性 或任何其他唯一标识您要查找的 div 的内容,您可以使用 Where-Object
子句来查找您要查找的部分为.
尝试
$div = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }).outerText
# if you're on PowerShell version < 7.1, you need to replace the (first) colons into equal signs
$result = $div -replace '(?<!:.*):', '=' | ConvertFrom-StringData
# for PowerShell 7.1, you can use the `-Delimiter` parameter
#$result = $div | ConvertFrom-StringData -Delimiter ':'
结果是这样的哈希表:
Name Value
---- -----
Server Name VIP Owner
Server State AVAILABLE
Server Role ACTIVE
Network State GY
当然,如果报告中有更多这样的内容,您将不得不使用类似这样的内容遍历 divs:
$result = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }) | Foreach-Object {
$_.outerText -replace '(?<!:.*):', '=' | ConvertFrom-StringData
}
好的,所以原来的问题没有说明我们在处理什么..
显然,您的 HTML 包含这样的 div:
<div>=======================================</div>
<div>Service Name:MysqlReplica</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 1 ms</div>
<div>=======================================</div>
<div>Service Name:OCCAS</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 30280 ms</div>
要处理这样的块,您需要一种完全不同的方法:
# create a List object to store the results
$result = [System.Collections.Generic.List[object]]::new()
# create a temporary ordered dictionary to build the resulting items
$svcHash = [ordered]@{}
foreach ($div in $htmlcontent.ParsedHtml.getElementsByTagName('div')) {
switch -Regex ($div.InnerText) {
'^=+' {
if ($svcHash.Count) {
# add the completed object to the list
$result.Add([PsCustomObject]$svcHash)
$svcHash = [ordered]@{}
}
}
'^(Service .+|Remarks):' {
# split into the property Name and its value
$name, $value = ($_ -split ':',2).Trim()
$svcHash[$name] = $value
}
}
}
if ($svcHash.Count) {
# if we have a final service block filled. This happens when no closing
# <div>=======================================</div>
# was found in the HTML, we need to add that to our final array of PSObjects
$result.Add([PsCustomObject]$svcHash)
}
# output on screen
$result | Format-Table -AutoSize
# output to CSV file
$result | Export-Csv -Path 'X:\services.csv' -NoTypeInformation
使用上面的例子在屏幕上输出:
Service Name Service Status Remarks
------------ -------------- -------
MysqlReplica RUNNING Change role completed in 1 ms
OCCAS RUNNING Change role completed in 30280 ms
我正在尝试解析没有任何 class 名称的服务器监控页面。 HTML 文件看起来像这样
<div style="float:left;margin-right:50px"><div>Server:VIP Owner</div><div>Server Role:ACTIVE</div><div>Server State:AVAILABLE</div><div>Network State:GY</div>
如何将此 html 内容解析为
之类的变量$Server VIP Owner
$Server_Role Active
$Server_State Available
因为没有 class 名称..我正在努力提取它。
$htmlcontent.ParsedHtml.getElementsByTagName('div') | ForEach-Object {
>> New-Variable -Name $_.className -Value $_.textContent
虽然您只向我们展示了 HTML 的一小部分,但很可能还有更多 <div>
标签。
如果没有 id
属性 或任何其他唯一标识您要查找的 div 的内容,您可以使用 Where-Object
子句来查找您要查找的部分为.
尝试
$div = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }).outerText
# if you're on PowerShell version < 7.1, you need to replace the (first) colons into equal signs
$result = $div -replace '(?<!:.*):', '=' | ConvertFrom-StringData
# for PowerShell 7.1, you can use the `-Delimiter` parameter
#$result = $div | ConvertFrom-StringData -Delimiter ':'
结果是这样的哈希表:
Name Value
---- -----
Server Name VIP Owner
Server State AVAILABLE
Server Role ACTIVE
Network State GY
当然,如果报告中有更多这样的内容,您将不得不使用类似这样的内容遍历 divs:
$result = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }) | Foreach-Object {
$_.outerText -replace '(?<!:.*):', '=' | ConvertFrom-StringData
}
好的,所以原来的问题没有说明我们在处理什么..
显然,您的 HTML 包含这样的 div:
<div>=======================================</div>
<div>Service Name:MysqlReplica</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 1 ms</div>
<div>=======================================</div>
<div>Service Name:OCCAS</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 30280 ms</div>
要处理这样的块,您需要一种完全不同的方法:
# create a List object to store the results
$result = [System.Collections.Generic.List[object]]::new()
# create a temporary ordered dictionary to build the resulting items
$svcHash = [ordered]@{}
foreach ($div in $htmlcontent.ParsedHtml.getElementsByTagName('div')) {
switch -Regex ($div.InnerText) {
'^=+' {
if ($svcHash.Count) {
# add the completed object to the list
$result.Add([PsCustomObject]$svcHash)
$svcHash = [ordered]@{}
}
}
'^(Service .+|Remarks):' {
# split into the property Name and its value
$name, $value = ($_ -split ':',2).Trim()
$svcHash[$name] = $value
}
}
}
if ($svcHash.Count) {
# if we have a final service block filled. This happens when no closing
# <div>=======================================</div>
# was found in the HTML, we need to add that to our final array of PSObjects
$result.Add([PsCustomObject]$svcHash)
}
# output on screen
$result | Format-Table -AutoSize
# output to CSV file
$result | Export-Csv -Path 'X:\services.csv' -NoTypeInformation
使用上面的例子在屏幕上输出:
Service Name Service Status Remarks
------------ -------------- -------
MysqlReplica RUNNING Change role completed in 1 ms
OCCAS RUNNING Change role completed in 30280 ms