HtmlAgilityPack - 选择节点
HtmlAgilityPack - SelectNodes
我正在尝试检索 <p class>
元素。
<div class="thread-plate__details">
<h3 class="thread-plate__title">(S) HexHunter BOW</h3>
<p class="thread-plate__summary">created by Aazoth</p> <!-- (THIS ONE) -->
</div>
但运气不好。
我使用的代码如下:
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("created by ")
Select attribute.Value
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
它可以运行,但我无法获取 'created by XXX' 文本。
我已经尝试了很多方法,但没有运气,我将不胜感激。
提前谢谢大家。
看起来您在 attribute.Value
中查找的字符串有误。我看到的是 attribute.Value.StartsWith("created by ")
必须改成这个 attribute.Value.StartsWith("thread-plate__summary")
.
要获取节点的内部内容,您必须这样做:Select node.InnerText
;
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("thread-plate__summary")
Select node.InnerText
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
我希望这对你有用。
我正在尝试检索 <p class>
元素。
<div class="thread-plate__details">
<h3 class="thread-plate__title">(S) HexHunter BOW</h3>
<p class="thread-plate__summary">created by Aazoth</p> <!-- (THIS ONE) -->
</div>
但运气不好。
我使用的代码如下:
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("created by ")
Select attribute.Value
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
它可以运行,但我无法获取 'created by XXX' 文本。 我已经尝试了很多方法,但没有运气,我将不胜感激。
提前谢谢大家。
看起来您在 attribute.Value
中查找的字符串有误。我看到的是 attribute.Value.StartsWith("created by ")
必须改成这个 attribute.Value.StartsWith("thread-plate__summary")
.
要获取节点的内部内容,您必须这样做:Select node.InnerText
;
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("thread-plate__summary")
Select node.InnerText
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
我希望这对你有用。