在 Class 的 Treeview 控件中显示从网页抓取的结果

Display results scraped from Webpage in Treeview Control from Class

我正在处理一个 Visual Basic 项目。我的工作环境是:

在这个阶段,我有:

我应该抓取一个网页(即:https://www.example.com),我想显示结果在放置在 Form1 上的 Treeview 控件中抓取。我已经尝试了一些方法并且它们工作正常,除了它们需要使用我不想使用的 Webbrowser Control 。我找到了一个我现在正在使用的方法,但它似乎不让我在表单上显示结果。

这是我的 Class1.vb 代码,它工作正常

    Imports System.Threading.Tasks
    Public Class Class1
        ' Create a WebBrowser instance.
        Private Event DocumentCompleted As WebBrowserDocumentCompletedEventHandler
        Private ManufacturersURi As New Uri("https://www.example.com/Webpage.php3")
        Public ManList As New List(Of TreeNode)
        Public Sub GettHelpPage()
            ' Create a WebBrowser instance.
            Dim webBrowserForPrinting As New WebBrowser() With {.ScriptErrorsSuppressed = True}
            ' Add an event handler that Scrape Data after it loads.
            AddHandler webBrowserForPrinting.DocumentCompleted, New _
            WebBrowserDocumentCompletedEventHandler(AddressOf GetManu_Name)
            ' Set the Url property to load the document.
            webBrowserForPrinting.Url = ManufacturersURi
        End Sub
        Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
            Dim webBrowserForPrinting As WebBrowser = CType(sender, WebBrowser)
            Dim Divs = webBrowserForPrinting.Document.Body.GetElementsByTagName("Div")
            ' Scrape the document now that it is fully loaded.
            Dim T As Task(Of List(Of TreeNode)) =
                 Task.Run(Function()
                      Dim LinksCount As Integer = 0
                      For Each Div As HtmlElement In Divs
                          If InStr(Div.GetAttribute("ClassName").ToString, "Div-Name", CompareMethod.Text) Then
                          LinksCount = Div.GetElementsByTagName("a").Count - 1
                          For I As Integer = 0 To LinksCount
                               Dim Txt() As String = Div.GetElementsByTagName("a").Item(I).InnerHtml.Split("<BR>")
                               Dim Manu_TreeNode As New TreeNode() With
                                              {.Name = I.ToString, .Text = Txt(0)}
                               ManList.Add(Manu_TreeNode)
                          Next
                          End If
                     Next
           Return ManList
        End Function)
' Dispose the WebBrowser now that the task is complete. 
Debug.WriteLine(T.Result.Count) 'Result is 116
webBrowserForPrinting.Dispose()
End Sub

以上代码得到116个TreeNodes,这是我抓取的Tags的个数。现在,当我尝试在 Form1_Load 上显示此结果时,没有任何反应,因为表单在代码完成执行之前加载。

这里是Form1_Load代码:

Public Class Form1
    Dim ThisClass As New Class1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        ThisClass.GetHelpPage()
        TreeView1.Nodes.Clear()
        For I As Integer = 0 To ThisClass.ManList.Count - 1
            TreeView1.Nodes.Add(ThisClass.ManList(I))
        Next
    End Sub
End Class

我注意到如果我在 Form1_Load 之前的某处放置一个空的 msgbox("") For..Next,它强制 Form1_Load 事件等待并成功填充 TreeView 控件。

What am I doing wrong ? or What am I missing there ?

试试这个

Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim ThisClass As New Class1
        Dim i As Integer = 0
        Do Until ThisClass.IsCompleted
            Threading.Thread.Sleep(100)
            'if the document takes too much time  
            i += 1
            If i > 30 Then Exit Do 'more than 3 sec
        Loop

        TreeView1.Nodes.Clear()
        For I As Integer = 0 To ThisClass.ManList.Count - 1
            TreeView1.Nodes.Add(ThisClass.ManList(I))
        Next
    End Sub
End Class
Class Class1
    Dim Completed As Boolean = False
    ReadOnly Property IsCompleted As Boolean
        Get
            Return Completed
        End Get
    End Property
    Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
        'your code
        Completed = True
        Return ManList
    End Sub

I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.

是的,如果您将其保持打开足够长的时间直到 GetManu_Name 方法中的任务完成,它就会发挥 await 的作用。由于 MsgBox 是模态 window,它会阻止下一行的执行,直到它被关闭。

现在,您可以通过从 GetManu_Name 方法中删除 Task.Run(...) 使其成为一个完整的同步调用,或者以如下方式使用异步模式:

Public Class WebStuff

    Public Shared Async Function ToTreeNodes(url As String) As Task(Of IEnumerable(Of TreeNode))
        Dim tcsNavigated As New TaskCompletionSource(Of Boolean)
        Dim tcsCompleted As New TaskCompletionSource(Of Boolean)
        Dim nodes As New List(Of TreeNode)

        Using wb As New WebBrowser With {.ScriptErrorsSuppressed = True}
            AddHandler wb.Navigated,
                Sub(s, e)
                    If tcsNavigated.Task.IsCompleted Then Return
                    tcsNavigated.SetResult(True)
                End Sub

            AddHandler wb.DocumentCompleted,
                Sub(s, e)
                    If wb.ReadyState <> WebBrowserReadyState.Complete OrElse
                    tcsCompleted.Task.IsCompleted Then Return
                    tcsCompleted.SetResult(True)
                End Sub

            wb.Navigate(url)

            Await tcsNavigated.Task
            'Navigated.. if you need to do something here...
            Await tcsCompleted.Task
            'DocumentCompeleted.. Now we can process the Body...

            Dim Divs = wb.Document.Body.GetElementsByTagName("Div")
            Dim LinksCount As Integer = 0

            For Each Div As HtmlElement In Divs
                If Div.GetAttribute("ClassName").
                    IndexOf("Div-Name", StringComparison.InvariantCultureIgnoreCase) > -1 Then
                    LinksCount = Div.GetElementsByTagName("a").Count - 1
                    For I As Integer = 0 To LinksCount
                        Dim Txt = Div.GetElementsByTagName("a").Item(I).InnerHtml.
                            Split({"<BR>"}, StringSplitOptions.RemoveEmptyEntries)
                        Dim n As New TreeNode With {
                            .Name = I.ToString, .Text = Txt.FirstOrDefault
                        }
                        nodes.Add(n)
                    Next
                End If
            Next
        End Using

        Return nodes
    End Function

End Class

方法注意事项:

需要在调用者签名中加上Async修饰符才能调用函数并等待结果。例如,Form.Load 事件:

Private Async Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    Dim nodes = Await WebStuff.ToTreeNodes("www....")
    TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub

Async方法:

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    PopulateTree()
End Sub

Private Async Sub PopulateTree()
    Dim nodes = Await WebStuff.ToTreeNodes("www....")
    TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub