正在解析具有不同行号的 HTML 个表

Parsing HTML tables with different row numbers

我正在尝试解析 HTML 表,但这些表在具有不同行号的行中不相等,(表单)下的所有表我选择(表单)作为 SingleNode,但是(tbody ) 行不是 (td),我无法循环所有 (td)。

部分HTML代码:

<form name="DetailsForm" method="post" action="">
  <input type="hidden" name="helpPageId" value="WF03">
    <input type="hidden" name="withMenu" value="1">
      <table width="100%" cellspacing="0" border="0">
        <tbody>
          <tr valign="center">
            <td class="blackHeadingLeft">Details</td>
          </tr>
          <tr></tr>
          <tr>
            <td></td>
          </tr>
        </tbody>
      </table>
      <table width="100%" cellspacing="0" border="0">
        <tbody>
          <tr>
            <td class="whiteTd" height="21">&nbsp;AWB:</td>
            <td class="whiteTdNormal" nowrap="nowrap" height="21">&nbsp; 7777995585 </td>
            <td class="whiteTd" nowrap="nowrap" height="21">&nbsp;No of Shipment Details:</td>
            <td class="whiteTdNormal" nowrap="nowrap" height="21">&nbsp; 1 </td>
            <td class="whiteTdNormal" width="100%" height="21">&nbsp;</td>
          </tr>
        </tbody>
      </table>
      <table class="bordered-table" width="100%" border="0">
        <tbody>
          <tr>
            <td class="grayTd" width="5%" height="21">&nbsp;Details</td>
            <td class="grayTd" width="5%" height="21" align="center">&nbsp;Orig</td>
            <td class="grayTd" width="8%" height="21" align="center">&nbsp;Location</td>
            <td class="grayTd" width="7%" height="21">&nbsp;Dest</td>
            <td class="grayTd" width="5%" height="21" align="center">&nbsp;Pcs</td>
            <td class="grayTd" width="5%" height="21">&nbsp;Weight(kg)</td>
            <td class="grayTd" width="11%" height="21">&nbsp;Volumetric Weight(kg)</td>
            <td class="grayTd" width="9%" height="21">&nbsp;Date/Time</td>
            <td class="grayTd" width="8%" height="21">&nbsp;Route/Cycle</td>
            <td class="grayTd" width="8%" height="21">&nbsp;Post Code</td>
            <td class="grayTd" width="6%" height="21">&nbsp;Product</td>
            <td class="grayTd" width="9%" height="21">&nbsp;Amount</td>
            <td class="grayTd" width="9%" height="21">&nbsp;Duplicate</td>
          </tr>

这是我能够做到的方式:

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);

        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            Console.WriteLine("Table: ");
            foreach (HtmlNode tbody in table.SelectNodes("tbody"))
            {
                if (tbody.ChildNodes.Any(x => x.Name == "tr"))
                {
                    Console.WriteLine("TBody: ");
                    foreach (HtmlNode cell in tbody.SelectNodes("tr"))
                    {
                        Console.WriteLine("TR: ");
                        if (cell.ChildNodes.Any(c => c.Name == "td"))
                        {
                            foreach (var item in cell.SelectNodes("td"))
                            {
                                Console.WriteLine("TD: ");
                                Console.WriteLine(item.InnerHtml);
                            }
                        }

                        Console.WriteLine();
                    }
                }
            }
        }

这样不管有多少tr 或td 标签。需要注意的一点是,如果出现tbody中没有tr或td标签的情况,则必须添加验证。

希望对您有所帮助。


已编辑以包括对 tr 和 td 标签的验证。类似的逻辑可用于所有其他可能丢失的标签。