如何使用 HtmlAgilityPack 解析 table 中的 <br> 标签?

How to parse <br> tags in table using HtmlAgilityPack?

我有一个 html table,其单元格值由
标记分隔。

<TABLE class=a12 cellSpacing=0 cols=8 cellPadding=0 border=1>
<TBODY>
    <TR>
        <TD style="WIDTH: 20.32mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
        <TD style="WIDTH: 34mm"></TD>
    </TR>
    <TR style="HEIGHT: 5.08mm">
        <TD class=a23><DIV class=r11>Hrs</DIV></TD>
        <TD class=a24><DIV class=r11>MON</DIV></TD>
        <TD class=a25><DIV class=r11>TUE</DIV></TD>
        <TD class=a26><DIV class=r11>WED</DIV></TD>
        <TD class=a27><DIV class=r11>THU</DIV></TD>
        <TD class=a28><DIV class=r11>FRI</DIV></TD>
        <TD class=a29><DIV class=r11>SAT</DIV></TD>
        <TD class=a30><DIV class=r11>SUN</DIV></TD>
    </TR>
    <TR style="HEIGHT: 14.7mm">
        <TD class=a59><DIV class=r11>00:00</DIV></TD>
        <TD class=a60><DIV class=r11>FGH<BR>BM</DIV></TD>
        <TD class=a61><DIV class=r11>RFG8<BR>MFT5</DIV></TD>
        <TD class=a62><DIV class=r11>V5B6<BR>FG</DIV></TD>
        <TD class=a63><DIV class=r11>VB2N<BR>BN</DIV></TD>
        <TD class=a64><DIV class=r11>DFG21</DIV></TD>
        <TD class=a65><DIV class=r11>FGH<BR>MD20<BR>DHB0</DIV></TD>
        <TD class=a66><DIV class=r11>FD6<BR>HT7H4</DIV></TD>
    </TR>
    <TR style="HEIGHT: 14.7mm">
        <TD class=a59><DIV class=r11>02:00</DIV></TD>
        <TD class=a60><DIV class=r11>VN</DIV></TD>
        <TD class=a61><DIV class=r11>RTY<BR>MHF</DIV></TD>
        <TD class=a62><DIV class=r11>V5B6<BR>FG</DIV></TD>
        <TD class=a63><DIV class=r11>ZXC<BR>FHF</DIV></TD>
        <TD class=a64><DIV class=r11>DFG21<BR>GH<BR>PKJK</DIV></TD>
        <TD class=a65><DIV class=r11>FGH<BR>MD20</DIV></TD>
        <TD class=a66><DIV class=r11>FFG<BR>HFG4</DIV></TD>
    </TR>
    <TR style="HEIGHT: 14.7mm">
        <TD class=a59><DIV class=r11>04:00</DIV></TD>
        <TD class=a60><DIV class=r11>VNFG</DIV></TD>
        <TD class=a61><DIV class=r11>RTY<BR>MHF<br>T54</DIV></TD>
        <TD class=a62><DIV class=r11>CNFG</DIV></TD>
        <TD class=a63><DIV class=r11>QFCF<BR>FHF</DIV></TD>
        <TD class=a64><DIV class=r11>DFG21<BR>GH67</DIV></TD>
        <TD class=a65><DIV class=r11>SDF<BR>DFH</DIV></TD>
        <TD class=a66><DIV class=r11>CXV<BR>HFG4</DIV></TD>
    </TR>
</TBODY>

我尝试将 html table 转换为数据table,但单元格值是串联的。

如何解析
标签,以便单元格值可以用逗号分隔而不是组合在一起?

Private Function ParseTable(doc As HtmlDocument) As DataTable
    Dim result As New DataTable()
    Dim TableClassA12 As HtmlNode = doc.DocumentNode.SelectSingleNode("//table[@class='a12']")
    Dim rows = TableClassA12.Descendants("tr")
    Dim header = rows.Skip(1).First()

    For Each column In header.Descendants("td")
        result.Columns.Add(New DataColumn(column.InnerText.Trim, GetType(String)))
    Next

    For Each row In rows.Skip(2)
        Dim data = New List(Of String)()
        For Each column In row.Descendants("td")
            Dim cellText As String = column.InnerText.Trim
            data.Add(cellText)
        Next
        If data.Count > 0 Then
            result.Rows.Add(data.ToArray())
        End If
    Next
    Return result
End Function

为了对现有代码进行最小的更改,您可以 select td 中的 div,然后访问 InnerHtml 以获取内部文本以及 <br> 标签。此时您可以简单地将 <br> 标签替换为逗号 :

For Each column In row.Descendants("td").SelectMany(Function(x) x.Elements("div"))
    Dim cellText As String = column.InnerHtml.Trim.Replace("<br>",",")
    data.Add(cellText)
Next