使用 HTML 敏捷包 (VB.net) 从 WebBrowser activity 抓取文本

Scrape text from WebBrowser activity using HTML agility pack (VB.net)

我想使用 HTML 敏捷包以 Windows 形式在 WebBrowser activity 中提取 fields/text。我可以在后台抓取文本,但想在我的表单内的 WebBrowser 中执行此操作。

我尝试将我的 HtmlDocument 变量引用到 WebBrowser1.Document,但我似乎无法转换它。

这是我遇到的错误

这些是变量类型

这是我的代码。

Imports System
Imports System.Xml
Imports HtmlAgilityPack


Public Class Form1

    Private Sub Form1_load(sender As System.Object, e As EventArgs) Handles MyBase.Load

        WebBrowser1.Navigate(TextBox3.Text)

    End Sub

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim link As String = TextBox3.Text
        Dim doc As HtmlDocument = New HtmlWeb().Load(link)
        Dim web_document As HtmlDocument = WebBrowser1.Document

        Dim name As HtmlNode = doc.DocumentNode.SelectSingleNode("//*[@id='details']/div[2]/div[2]/div/div[1]/h3")
        'if the div is found, print the inner text'
        If Not name Is Nothing Then
            TextBox1.Text = name.InnerText.Trim()

        End If


        Dim customer_number As HtmlNode = doc.DocumentNode.SelectSingleNode("//*[@id='details']/div[2]/div[2]/div/div[2]/dl[4]/dd")
        'if the div is found, print the inner text'
        If Not customer_number Is Nothing Then
            TextBox2.Text = customer_number.InnerText.Trim()

        End If

        MessageBox.Show("Doc variable: " + doc.GetType.ToString + Environment.NewLine + "web_document variable: " + web_document.GetType.ToString)

    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

    End Sub
End Class

问题是WebBrowser1.Document returns a Windows.Forms.HtmlDocument, which is not the same as HtmlAgilityPack.HtmlDocument

如果您想使用 HtmlAgilityPack 从 WebBrowser 控件中的网页中抓取 HTML,您需要从浏览器控件中获取 DocumentText 并将其加载到像这样的新 HtmlAgilityPack.HtmlDocument 实例:

Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(WebBrowser1.DocumentText)