如何避免重复使用 HtmlAgilityPack 从 HTML 源中提取的数据
How to avoid repeatition of data extracted from a HTML source using HtmlAgilityPack
我正在使用 HtmlAgilityPack 从 HTML 代码源中提取数据。
这是 HTML:
的示例
<div class="enum-container">
<div class="enum">
<span class="field-key">MD5</span> a4188cf2b9189f82b855350233a307eb
</div>
<div class="enum">
<span class="field-key">SHA1</span> c3eedd67a14810b8c639eb77ed2731e574245b2a
</div>
<div class="enum">
<span class="field-key">File size</span>
3.8 KB ( 3854 bytes )
</div>
</div>
我使用这个代码:
Dim Table2 As New DataTable()
Table2.Columns.Add("Value1", GetType(String))
Table2.Columns.Add("Value2", GetType(String))
For Each row1 As HtmlNode In doc.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']")
Dim MyValue1 As HtmlNode = row1.SelectSingleNode("//span[@class='field-key']")
Dim MyValue2 As String = row1.InnerText
Table2.Rows.Add(MyValue1.InnerText, MyValue2)
Next
DataGridView3.DataSource = Table2
结果是这样的:
http://i.stack.imgur.com/vPriY.png
你可以看到,第一列得到了一个重复的值(MD5)。
我想要的是这样的:
http://i.stack.imgur.com/jlsk5.png
谢谢。
您正在 select 搜索文档中与“//”xpath 匹配的文档中的第一个跨度。您需要从您的 xpath 中删除它,这样它将 select 直接继承人。
C#
DataTable fileDetailsTable = new DataTable();
fileDetailsTable.Columns.Add("Key", typeof(string));
fileDetailsTable.Columns.Add("Value", typeof(string));
HtmlNodeCollection enumNodes = document.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']");
foreach (HtmlNode enumNode in enumNodes)
{
//Select the child span from the enum node.
HtmlNode fieldKeyNode = enumNode.SelectSingleNode("span[@class='field-key']");
if (fieldKeyNode != null)
{
//Grab the key.
string fieldKey = fieldKeyNode.InnerText;
//Grab the value which is the field key's sibling
string fieldValue = fieldKeyNode.NextSibling.InnerText;
fileDetailsTable.Rows.Add(fieldKey, fieldValue);
}
}
VB.NET
Dim fileDetailsTable As New DataTable()
fileDetailsTable.Columns.Add("Key", GetType(String))
fileDetailsTable.Columns.Add("Value", GetType(String))
Dim enumNodes As HtmlNodeCollection = document.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']")
For Each enumNode As HtmlNode In enumNodes
'Select the child span from the enum node.
Dim fieldKeyNode As HtmlNode = enumNode.SelectSingleNode("span[@class='field-key']")
If fieldKeyNode IsNot Nothing Then
'Grab the key.
Dim fieldKey As String = fieldKeyNode.InnerText
'Grab the value which is the field key's sibling
Dim fieldValue As String = fieldKeyNode.NextSibling.InnerText
fileDetailsTable.Rows.Add(fieldKey, fieldValue)
End If
Next
我正在使用 HtmlAgilityPack 从 HTML 代码源中提取数据。 这是 HTML:
的示例<div class="enum-container">
<div class="enum">
<span class="field-key">MD5</span> a4188cf2b9189f82b855350233a307eb
</div>
<div class="enum">
<span class="field-key">SHA1</span> c3eedd67a14810b8c639eb77ed2731e574245b2a
</div>
<div class="enum">
<span class="field-key">File size</span>
3.8 KB ( 3854 bytes )
</div>
</div>
我使用这个代码:
Dim Table2 As New DataTable()
Table2.Columns.Add("Value1", GetType(String))
Table2.Columns.Add("Value2", GetType(String))
For Each row1 As HtmlNode In doc.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']")
Dim MyValue1 As HtmlNode = row1.SelectSingleNode("//span[@class='field-key']")
Dim MyValue2 As String = row1.InnerText
Table2.Rows.Add(MyValue1.InnerText, MyValue2)
Next
DataGridView3.DataSource = Table2
结果是这样的:
http://i.stack.imgur.com/vPriY.png
你可以看到,第一列得到了一个重复的值(MD5)。
我想要的是这样的:
http://i.stack.imgur.com/jlsk5.png
谢谢。
您正在 select 搜索文档中与“//”xpath 匹配的文档中的第一个跨度。您需要从您的 xpath 中删除它,这样它将 select 直接继承人。
C#
DataTable fileDetailsTable = new DataTable();
fileDetailsTable.Columns.Add("Key", typeof(string));
fileDetailsTable.Columns.Add("Value", typeof(string));
HtmlNodeCollection enumNodes = document.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']");
foreach (HtmlNode enumNode in enumNodes)
{
//Select the child span from the enum node.
HtmlNode fieldKeyNode = enumNode.SelectSingleNode("span[@class='field-key']");
if (fieldKeyNode != null)
{
//Grab the key.
string fieldKey = fieldKeyNode.InnerText;
//Grab the value which is the field key's sibling
string fieldValue = fieldKeyNode.NextSibling.InnerText;
fileDetailsTable.Rows.Add(fieldKey, fieldValue);
}
}
VB.NET
Dim fileDetailsTable As New DataTable()
fileDetailsTable.Columns.Add("Key", GetType(String))
fileDetailsTable.Columns.Add("Value", GetType(String))
Dim enumNodes As HtmlNodeCollection = document.DocumentNode.SelectNodes("//div[@id='file-details']//div[@class='enum-container']//div[@class='enum']")
For Each enumNode As HtmlNode In enumNodes
'Select the child span from the enum node.
Dim fieldKeyNode As HtmlNode = enumNode.SelectSingleNode("span[@class='field-key']")
If fieldKeyNode IsNot Nothing Then
'Grab the key.
Dim fieldKey As String = fieldKeyNode.InnerText
'Grab the value which is the field key's sibling
Dim fieldValue As String = fieldKeyNode.NextSibling.InnerText
fileDetailsTable.Rows.Add(fieldKey, fieldValue)
End If
Next