HtmlAgilityPack - 选择节点

HtmlAgilityPack - SelectNodes

我正在尝试检索 <p class> 元素。

<div class="thread-plate__details">
    <h3 class="thread-plate__title">(S) HexHunter BOW</h3>
    <p class="thread-plate__summary">created by Aazoth</p>  <!-- (THIS ONE) -->
</div>

但运气不好。

我使用的代码如下:

' the example url to scrape
            Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
            Dim source As String = GetSource(url)

            If source IsNot Nothing Then
                ' create a new html document and load the pages source
                Dim htmlDocument As New HtmlDocument
                htmlDocument.LoadHtml(source)

                ' Create a new collection of all href tags
                Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

                ' Using LINQ get all href values that start with http://
                ' of course there are others such as www.
                Dim links =
                    (
                        From node
                        In nodes
                        Let attribute = node.Attributes("class")
                        Where attribute.Value.StartsWith("created by ")
                        Select attribute.Value
                    )

                Me.ListBox1a.Items.AddRange(links.ToArray)
                Dim o, j As Long
                For o = 0 To ListBox1a.Items.Count - 1
                    For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
                        If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                            ListBox1a.Items.Remove(ListBox1a.Items((j)))
                        End If
                    Next
                Next
                For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
                    Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

                Next

                For Each s As String In ListBox1a.Items
                    Dim lvi As New NetSeal.NSListView
                    lvi.Text = s
                    NsListView1.Items.Add(lvi.Text)

                Next

它可以运行,但我无法获取 'created by XXX' 文本。 我已经尝试了很多方法,但没有运气,我将不胜感激。

提前谢谢大家。

看起来您在 attribute.Value 中查找的字符串有误。我看到的是 attribute.Value.StartsWith("created by ") 必须改成这个 attribute.Value.StartsWith("thread-plate__summary").

要获取节点的内部内容,您必须这样做:Select node.InnerText

' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)

If source IsNot Nothing Then
    ' create a new html document and load the pages source
    Dim htmlDocument As New HtmlDocument
    htmlDocument.LoadHtml(source)

    ' Create a new collection of all href tags
    Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[@class]")

    ' Using LINQ get all href values that start with http://
    ' of course there are others such as www.
    Dim links =
        (
            From node
            In nodes
            Let attribute = node.Attributes("class")
            Where attribute.Value.StartsWith("thread-plate__summary")
            Select node.InnerText
        )

    Me.ListBox1a.Items.AddRange(links.ToArray)
    Dim o, j As Long
    For o = 0 To ListBox1a.Items.Count - 1
        For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
            If ListBox1a.Items(o) = ListBox1a.Items(j) Then
                ListBox1a.Items.Remove(ListBox1a.Items((j)))
            End If
        Next
    Next
    For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
        Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")

    Next

    For Each s As String In ListBox1a.Items
        Dim lvi As New NetSeal.NSListView
        lvi.Text = s
        NsListView1.Items.Add(lvi.Text)

    Next

我希望这对你有用。