通过 HTML Agility Pack 获取另一个 table 中嵌入的 table 的行和列
Get Rows and Columns of an embedded table within another table via HTML Agility Pack
VB.2012 使用 HTML 敏捷包。
我花了几个小时试图解决这个问题,这是我对输入格式的无知。情况就是这样,这是我的输入:一个简单的 HTML table 和另外两个 table 嵌入的
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td width="100%">
<table cellpadding="0" cellspacing="0" border="0" class="plan">
<tr>
<td class="textBold" valign="bottom">XX <u>999</u></td>
<td class="centerText" valign="bottom">X1</td>
<td class="centerText" valign="bottom">X2</td>
<td class="centerText" valign="bottom">X3</td>
<td class="centerText" valign="bottom">X4</td>
<td class="centerText" valign="bottom">X5</td>
<td class="centerTextTotal" valign="bottom">TOTAL</td>
</tr>
<tr>
<td class="Text">PRIMARY</td>
<td class="centerText">4</td>
<td class="centerText">8</td>
<td class="centerText"> </td>
<td class="centerText">1</td>
<td class="centerText">3</td>
<td class="centerTextTotal">16</td>
</tr>
<tr>
<td class="TextColor">SECONDARY</td>
<td class="centerTextColor"> </td>
<td class="centerTextColor"> </td>
<td class="centerTextColor">2</td>
<td class="centerTextColor"> </td>
<td class="centerTextColor">2</td>
<td class="centerTextTotal">4</td>
</tr>
<tr>
<td class="TextTotal">TOTAL</td>
<td class="centerTextTotal">4</td>
<td class="centerTextTotal">8</td>
<td class="centerTextTotal">2</td>
<td class="centerTextTotal">1</td>
<td class="centerTextTotal">5</td>
<td class="centerTextTotal">20</td>
</tr>
</table>
</td>
</tr>
<tr>
<td width="100%">
<table cellpadding="0" cellspacing="0" border="0" width="100%">
<tr>
<td width="75%" class="" textcolorvalign="bottom">Number of fuelings:0</td>
<td width="25%" class="" textcolorvalign="bottom" align="right">Meals:2</td>
</tr>
</table>
</td>
</tr>
</table>
我只关心内部 table "plan".
中的数据
Dim html As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
html.OptionOutputAsXml = False
html.LoadHtml(htmlTable)
Dim docNode As HtmlAgilityPack.HtmlNode = html.DocumentNode
'parse the plan table if it exists
If docNode IsNot Nothing Then
Dim hTable As HtmlAgilityPack.HtmlNode = docNode.SelectSingleNode("//table[@class='plan']")
If hTable IsNot Nothing Then
For Each hRow As HtmlAgilityPack.HtmlNode In hTable.SelectNodes("//table[@class='plan']//tr") '"//tr"
Debug.Print(" InnerText=>[{0}] InnerHtml=>[{1}]", hRow.InnerText, hRow.InnerHtml)
For Each hCol As HtmlAgilityPack.HtmlNode In hRow.SelectNodes("//table[@class='plan']//tr//td") '"//td"
Debug.Print(" InnerText=>[{0}] InnerHtml=>[{1}]", hCol.InnerText, hCol.InnerHtml)
Next hCol
Next hRow
End If
End If
右边是我最初使用的字符串 //tr 和 //td。我的逻辑是,因为我正在使用节点 hTable 和 hRow,所以我会得到相应的子节点。然而,这似乎会让我得到所有 table 的所有行和所有列。经过测试,我似乎必须使用 //table[@class='plan']//tr 和 //table[@[=26= 来完全限定每个循环]='plan']//tr//td。这是为什么???这对我来说没有意义,因为我明确地使用了子节点对象 hTable 和 hRow。
根据 this,在 XPath 中 //
表示从根开始搜索,如果您想从当前上下文中搜索,则需要 .//
。因此,尝试 .//tr
和 .//td
进行相对于当前元素的搜索。
VB.2012 使用 HTML 敏捷包。 我花了几个小时试图解决这个问题,这是我对输入格式的无知。情况就是这样,这是我的输入:一个简单的 HTML table 和另外两个 table 嵌入的
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td width="100%">
<table cellpadding="0" cellspacing="0" border="0" class="plan">
<tr>
<td class="textBold" valign="bottom">XX <u>999</u></td>
<td class="centerText" valign="bottom">X1</td>
<td class="centerText" valign="bottom">X2</td>
<td class="centerText" valign="bottom">X3</td>
<td class="centerText" valign="bottom">X4</td>
<td class="centerText" valign="bottom">X5</td>
<td class="centerTextTotal" valign="bottom">TOTAL</td>
</tr>
<tr>
<td class="Text">PRIMARY</td>
<td class="centerText">4</td>
<td class="centerText">8</td>
<td class="centerText"> </td>
<td class="centerText">1</td>
<td class="centerText">3</td>
<td class="centerTextTotal">16</td>
</tr>
<tr>
<td class="TextColor">SECONDARY</td>
<td class="centerTextColor"> </td>
<td class="centerTextColor"> </td>
<td class="centerTextColor">2</td>
<td class="centerTextColor"> </td>
<td class="centerTextColor">2</td>
<td class="centerTextTotal">4</td>
</tr>
<tr>
<td class="TextTotal">TOTAL</td>
<td class="centerTextTotal">4</td>
<td class="centerTextTotal">8</td>
<td class="centerTextTotal">2</td>
<td class="centerTextTotal">1</td>
<td class="centerTextTotal">5</td>
<td class="centerTextTotal">20</td>
</tr>
</table>
</td>
</tr>
<tr>
<td width="100%">
<table cellpadding="0" cellspacing="0" border="0" width="100%">
<tr>
<td width="75%" class="" textcolorvalign="bottom">Number of fuelings:0</td>
<td width="25%" class="" textcolorvalign="bottom" align="right">Meals:2</td>
</tr>
</table>
</td>
</tr>
</table>
我只关心内部 table "plan".
中的数据 Dim html As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
html.OptionOutputAsXml = False
html.LoadHtml(htmlTable)
Dim docNode As HtmlAgilityPack.HtmlNode = html.DocumentNode
'parse the plan table if it exists
If docNode IsNot Nothing Then
Dim hTable As HtmlAgilityPack.HtmlNode = docNode.SelectSingleNode("//table[@class='plan']")
If hTable IsNot Nothing Then
For Each hRow As HtmlAgilityPack.HtmlNode In hTable.SelectNodes("//table[@class='plan']//tr") '"//tr"
Debug.Print(" InnerText=>[{0}] InnerHtml=>[{1}]", hRow.InnerText, hRow.InnerHtml)
For Each hCol As HtmlAgilityPack.HtmlNode In hRow.SelectNodes("//table[@class='plan']//tr//td") '"//td"
Debug.Print(" InnerText=>[{0}] InnerHtml=>[{1}]", hCol.InnerText, hCol.InnerHtml)
Next hCol
Next hRow
End If
End If
右边是我最初使用的字符串 //tr 和 //td。我的逻辑是,因为我正在使用节点 hTable 和 hRow,所以我会得到相应的子节点。然而,这似乎会让我得到所有 table 的所有行和所有列。经过测试,我似乎必须使用 //table[@class='plan']//tr 和 //table[@[=26= 来完全限定每个循环]='plan']//tr//td。这是为什么???这对我来说没有意义,因为我明确地使用了子节点对象 hTable 和 hRow。
根据 this,在 XPath 中 //
表示从根开始搜索,如果您想从当前上下文中搜索,则需要 .//
。因此,尝试 .//tr
和 .//td
进行相对于当前元素的搜索。