如何避免重复使用 HtmlAgilityPack 从 HTML 源中提取的数据

How to avoid repeatition of data extracted from a HTML source using HtmlAgilityPack

我正在使用 HtmlAgilityPack 从 HTML 代码源中提取数据。 这是 HTML:

的示例
<div class="enum-container">
    <div class="enum">
        <span class="field-key">MD5</span> a4188cf2b9189f82b855350233a307eb
    </div>
    <div class="enum">
        <span class="field-key">SHA1</span> c3eedd67a14810b8c639eb77ed2731e574245b2a
    </div>
    <div class="enum">
        <span class="field-key">File size</span>
        3.8 KB ( 3854 bytes )
    </div>
</div>

我使用这个代码:

    Dim Table2 As New DataTable()
    Table2.Columns.Add("Value1", GetType(String))
    Table2.Columns.Add("Value2", GetType(String))

    For Each row1 As HtmlNode In doc.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']")
        Dim MyValue1 As HtmlNode = row1.SelectSingleNode("//span[@class='field-key']")
        Dim MyValue2 As String = row1.InnerText
        Table2.Rows.Add(MyValue1.InnerText, MyValue2)
    Next

    DataGridView3.DataSource = Table2

结果是这样的:

http://i.stack.imgur.com/vPriY.png

你可以看到,第一列得到了一个重复的值(MD5)。


我想要的是这样的:

http://i.stack.imgur.com/jlsk5.png

谢谢。

您正在 select 搜索文档中与“//”xpath 匹配的文档中的第一个跨度。您需要从您的 xpath 中删除它,这样它将 select 直接继承人。

C#

DataTable fileDetailsTable = new DataTable();
fileDetailsTable.Columns.Add("Key", typeof(string));
fileDetailsTable.Columns.Add("Value", typeof(string));

HtmlNodeCollection enumNodes = document.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']");
foreach (HtmlNode enumNode in enumNodes)
{
    //Select the child span from the enum node.
    HtmlNode fieldKeyNode = enumNode.SelectSingleNode("span[@class='field-key']");

    if (fieldKeyNode != null)
    {
        //Grab the key.
        string fieldKey = fieldKeyNode.InnerText;

        //Grab the value which is the field key's sibling
        string fieldValue = fieldKeyNode.NextSibling.InnerText;

        fileDetailsTable.Rows.Add(fieldKey, fieldValue);
    }
}

VB.NET

Dim fileDetailsTable As New DataTable()
fileDetailsTable.Columns.Add("Key", GetType(String))
fileDetailsTable.Columns.Add("Value", GetType(String))

Dim enumNodes As HtmlNodeCollection = document.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']")
For Each enumNode As HtmlNode In enumNodes
    'Select the child span from the enum node.
    Dim fieldKeyNode As HtmlNode = enumNode.SelectSingleNode("span[@class='field-key']")

    If fieldKeyNode IsNot Nothing Then
        'Grab the key.
        Dim fieldKey As String = fieldKeyNode.InnerText

        'Grab the value which is the field key's sibling
        Dim fieldValue As String = fieldKeyNode.NextSibling.InnerText

        fileDetailsTable.Rows.Add(fieldKey, fieldValue)
    End If
Next