正在解析具有不同行号的 HTML 个表
Parsing HTML tables with different row numbers
我正在尝试解析 HTML 表,但这些表在具有不同行号的行中不相等,(表单)下的所有表我选择(表单)作为 SingleNode,但是(tbody ) 行不是 (td),我无法循环所有 (td)。
部分HTML代码:
<form name="DetailsForm" method="post" action="">
<input type="hidden" name="helpPageId" value="WF03">
<input type="hidden" name="withMenu" value="1">
<table width="100%" cellspacing="0" border="0">
<tbody>
<tr valign="center">
<td class="blackHeadingLeft">Details</td>
</tr>
<tr></tr>
<tr>
<td></td>
</tr>
</tbody>
</table>
<table width="100%" cellspacing="0" border="0">
<tbody>
<tr>
<td class="whiteTd" height="21"> AWB:</td>
<td class="whiteTdNormal" nowrap="nowrap" height="21"> 7777995585 </td>
<td class="whiteTd" nowrap="nowrap" height="21"> No of Shipment Details:</td>
<td class="whiteTdNormal" nowrap="nowrap" height="21"> 1 </td>
<td class="whiteTdNormal" width="100%" height="21"> </td>
</tr>
</tbody>
</table>
<table class="bordered-table" width="100%" border="0">
<tbody>
<tr>
<td class="grayTd" width="5%" height="21"> Details</td>
<td class="grayTd" width="5%" height="21" align="center"> Orig</td>
<td class="grayTd" width="8%" height="21" align="center"> Location</td>
<td class="grayTd" width="7%" height="21"> Dest</td>
<td class="grayTd" width="5%" height="21" align="center"> Pcs</td>
<td class="grayTd" width="5%" height="21"> Weight(kg)</td>
<td class="grayTd" width="11%" height="21"> Volumetric Weight(kg)</td>
<td class="grayTd" width="9%" height="21"> Date/Time</td>
<td class="grayTd" width="8%" height="21"> Route/Cycle</td>
<td class="grayTd" width="8%" height="21"> Post Code</td>
<td class="grayTd" width="6%" height="21"> Product</td>
<td class="grayTd" width="9%" height="21"> Amount</td>
<td class="grayTd" width="9%" height="21"> Duplicate</td>
</tr>
这是我能够做到的方式:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
{
Console.WriteLine("Table: ");
foreach (HtmlNode tbody in table.SelectNodes("tbody"))
{
if (tbody.ChildNodes.Any(x => x.Name == "tr"))
{
Console.WriteLine("TBody: ");
foreach (HtmlNode cell in tbody.SelectNodes("tr"))
{
Console.WriteLine("TR: ");
if (cell.ChildNodes.Any(c => c.Name == "td"))
{
foreach (var item in cell.SelectNodes("td"))
{
Console.WriteLine("TD: ");
Console.WriteLine(item.InnerHtml);
}
}
Console.WriteLine();
}
}
}
}
这样不管有多少tr 或td 标签。需要注意的一点是,如果出现tbody中没有tr或td标签的情况,则必须添加验证。
希望对您有所帮助。
已编辑以包括对 tr 和 td 标签的验证。类似的逻辑可用于所有其他可能丢失的标签。
我正在尝试解析 HTML 表,但这些表在具有不同行号的行中不相等,(表单)下的所有表我选择(表单)作为 SingleNode,但是(tbody ) 行不是 (td),我无法循环所有 (td)。
部分HTML代码:
<form name="DetailsForm" method="post" action="">
<input type="hidden" name="helpPageId" value="WF03">
<input type="hidden" name="withMenu" value="1">
<table width="100%" cellspacing="0" border="0">
<tbody>
<tr valign="center">
<td class="blackHeadingLeft">Details</td>
</tr>
<tr></tr>
<tr>
<td></td>
</tr>
</tbody>
</table>
<table width="100%" cellspacing="0" border="0">
<tbody>
<tr>
<td class="whiteTd" height="21"> AWB:</td>
<td class="whiteTdNormal" nowrap="nowrap" height="21"> 7777995585 </td>
<td class="whiteTd" nowrap="nowrap" height="21"> No of Shipment Details:</td>
<td class="whiteTdNormal" nowrap="nowrap" height="21"> 1 </td>
<td class="whiteTdNormal" width="100%" height="21"> </td>
</tr>
</tbody>
</table>
<table class="bordered-table" width="100%" border="0">
<tbody>
<tr>
<td class="grayTd" width="5%" height="21"> Details</td>
<td class="grayTd" width="5%" height="21" align="center"> Orig</td>
<td class="grayTd" width="8%" height="21" align="center"> Location</td>
<td class="grayTd" width="7%" height="21"> Dest</td>
<td class="grayTd" width="5%" height="21" align="center"> Pcs</td>
<td class="grayTd" width="5%" height="21"> Weight(kg)</td>
<td class="grayTd" width="11%" height="21"> Volumetric Weight(kg)</td>
<td class="grayTd" width="9%" height="21"> Date/Time</td>
<td class="grayTd" width="8%" height="21"> Route/Cycle</td>
<td class="grayTd" width="8%" height="21"> Post Code</td>
<td class="grayTd" width="6%" height="21"> Product</td>
<td class="grayTd" width="9%" height="21"> Amount</td>
<td class="grayTd" width="9%" height="21"> Duplicate</td>
</tr>
这是我能够做到的方式:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
{
Console.WriteLine("Table: ");
foreach (HtmlNode tbody in table.SelectNodes("tbody"))
{
if (tbody.ChildNodes.Any(x => x.Name == "tr"))
{
Console.WriteLine("TBody: ");
foreach (HtmlNode cell in tbody.SelectNodes("tr"))
{
Console.WriteLine("TR: ");
if (cell.ChildNodes.Any(c => c.Name == "td"))
{
foreach (var item in cell.SelectNodes("td"))
{
Console.WriteLine("TD: ");
Console.WriteLine(item.InnerHtml);
}
}
Console.WriteLine();
}
}
}
}
这样不管有多少tr 或td 标签。需要注意的一点是,如果出现tbody中没有tr或td标签的情况,则必须添加验证。
希望对您有所帮助。
已编辑以包括对 tr 和 td 标签的验证。类似的逻辑可用于所有其他可能丢失的标签。