在 Class 的 Treeview 控件中显示从网页抓取的结果
Display results scraped from Webpage in Treeview Control from Class
我正在处理一个 Visual Basic 项目。我的工作环境是:
- Windows10个32位
- Visual Studio 2015
- .Net Framework 4.8
- Winform
在这个阶段,我有:
- Class (Class1.vb)
- 带有 TreeView 控件的 Form1 (Form1.vb)
我应该抓取一个网页(即:https://www.example.com),我想显示结果在放置在 Form1 上的 Treeview 控件中抓取。我已经尝试了一些方法并且它们工作正常,除了它们需要使用我不想使用的 Webbrowser Control 。我找到了一个我现在正在使用的方法,但它似乎不让我在表单上显示结果。
这是我的 Class1.vb 代码,它工作正常
Imports System.Threading.Tasks
Public Class Class1
' Create a WebBrowser instance.
Private Event DocumentCompleted As WebBrowserDocumentCompletedEventHandler
Private ManufacturersURi As New Uri("https://www.example.com/Webpage.php3")
Public ManList As New List(Of TreeNode)
Public Sub GettHelpPage()
' Create a WebBrowser instance.
Dim webBrowserForPrinting As New WebBrowser() With {.ScriptErrorsSuppressed = True}
' Add an event handler that Scrape Data after it loads.
AddHandler webBrowserForPrinting.DocumentCompleted, New _
WebBrowserDocumentCompletedEventHandler(AddressOf GetManu_Name)
' Set the Url property to load the document.
webBrowserForPrinting.Url = ManufacturersURi
End Sub
Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
Dim webBrowserForPrinting As WebBrowser = CType(sender, WebBrowser)
Dim Divs = webBrowserForPrinting.Document.Body.GetElementsByTagName("Div")
' Scrape the document now that it is fully loaded.
Dim T As Task(Of List(Of TreeNode)) =
Task.Run(Function()
Dim LinksCount As Integer = 0
For Each Div As HtmlElement In Divs
If InStr(Div.GetAttribute("ClassName").ToString, "Div-Name", CompareMethod.Text) Then
LinksCount = Div.GetElementsByTagName("a").Count - 1
For I As Integer = 0 To LinksCount
Dim Txt() As String = Div.GetElementsByTagName("a").Item(I).InnerHtml.Split("<BR>")
Dim Manu_TreeNode As New TreeNode() With
{.Name = I.ToString, .Text = Txt(0)}
ManList.Add(Manu_TreeNode)
Next
End If
Next
Return ManList
End Function)
' Dispose the WebBrowser now that the task is complete.
Debug.WriteLine(T.Result.Count) 'Result is 116
webBrowserForPrinting.Dispose()
End Sub
以上代码得到116个TreeNodes,这是我抓取的Tags的个数。现在,当我尝试在 Form1_Load 上显示此结果时,没有任何反应,因为表单在代码完成执行之前加载。
这里是Form1_Load代码:
Public Class Form1
Dim ThisClass As New Class1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
ThisClass.GetHelpPage()
TreeView1.Nodes.Clear()
For I As Integer = 0 To ThisClass.ManList.Count - 1
TreeView1.Nodes.Add(ThisClass.ManList(I))
Next
End Sub
End Class
我注意到如果我在 Form1_Load 之前的某处放置一个空的 msgbox("") For..Next,它强制 Form1_Load 事件等待并成功填充 TreeView 控件。
What am I doing wrong ? or What am I missing there ?
试试这个
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim ThisClass As New Class1
Dim i As Integer = 0
Do Until ThisClass.IsCompleted
Threading.Thread.Sleep(100)
'if the document takes too much time
i += 1
If i > 30 Then Exit Do 'more than 3 sec
Loop
TreeView1.Nodes.Clear()
For I As Integer = 0 To ThisClass.ManList.Count - 1
TreeView1.Nodes.Add(ThisClass.ManList(I))
Next
End Sub
End Class
Class Class1
Dim Completed As Boolean = False
ReadOnly Property IsCompleted As Boolean
Get
Return Completed
End Get
End Property
Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
'your code
Completed = True
Return ManList
End Sub
I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.
是的,如果您将其保持打开足够长的时间直到 GetManu_Name
方法中的任务完成,它就会发挥 await
的作用。由于 MsgBox 是模态 window,它会阻止下一行的执行,直到它被关闭。
现在,您可以通过从 GetManu_Name
方法中删除 Task.Run(...)
使其成为一个完整的同步调用,或者以如下方式使用异步模式:
Public Class WebStuff
Public Shared Async Function ToTreeNodes(url As String) As Task(Of IEnumerable(Of TreeNode))
Dim tcsNavigated As New TaskCompletionSource(Of Boolean)
Dim tcsCompleted As New TaskCompletionSource(Of Boolean)
Dim nodes As New List(Of TreeNode)
Using wb As New WebBrowser With {.ScriptErrorsSuppressed = True}
AddHandler wb.Navigated,
Sub(s, e)
If tcsNavigated.Task.IsCompleted Then Return
tcsNavigated.SetResult(True)
End Sub
AddHandler wb.DocumentCompleted,
Sub(s, e)
If wb.ReadyState <> WebBrowserReadyState.Complete OrElse
tcsCompleted.Task.IsCompleted Then Return
tcsCompleted.SetResult(True)
End Sub
wb.Navigate(url)
Await tcsNavigated.Task
'Navigated.. if you need to do something here...
Await tcsCompleted.Task
'DocumentCompeleted.. Now we can process the Body...
Dim Divs = wb.Document.Body.GetElementsByTagName("Div")
Dim LinksCount As Integer = 0
For Each Div As HtmlElement In Divs
If Div.GetAttribute("ClassName").
IndexOf("Div-Name", StringComparison.InvariantCultureIgnoreCase) > -1 Then
LinksCount = Div.GetElementsByTagName("a").Count - 1
For I As Integer = 0 To LinksCount
Dim Txt = Div.GetElementsByTagName("a").Item(I).InnerHtml.
Split({"<BR>"}, StringSplitOptions.RemoveEmptyEntries)
Dim n As New TreeNode With {
.Name = I.ToString, .Text = Txt.FirstOrDefault
}
nodes.Add(n)
Next
End If
Next
End Using
Return nodes
End Function
End Class
方法注意事项:
- 单个
Async
函数来完成冗长的任务并 returns IEnumerable(Of TreeNode) 给调用者。
- Lambda Expressions are used to add the WebBrowser.Navigated and WebBrowser.DocumentCompleted 个事件。
- TaskCompletionSource is necessary here to wait for the completion of the WebBrowser.DocumentCompleted 事件,以便能够处理 HTML 内容。
需要在调用者签名中加上Async
修饰符才能调用函数并等待结果。例如,Form.Load
事件:
Private Async Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim nodes = Await WebStuff.ToTreeNodes("www....")
TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub
或Async
方法:
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
PopulateTree()
End Sub
Private Async Sub PopulateTree()
Dim nodes = Await WebStuff.ToTreeNodes("www....")
TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub
我正在处理一个 Visual Basic 项目。我的工作环境是:
- Windows10个32位
- Visual Studio 2015
- .Net Framework 4.8
- Winform
在这个阶段,我有:
- Class (Class1.vb)
- 带有 TreeView 控件的 Form1 (Form1.vb)
我应该抓取一个网页(即:https://www.example.com),我想显示结果在放置在 Form1 上的 Treeview 控件中抓取。我已经尝试了一些方法并且它们工作正常,除了它们需要使用我不想使用的 Webbrowser Control 。我找到了一个我现在正在使用的方法,但它似乎不让我在表单上显示结果。
这是我的 Class1.vb 代码,它工作正常
Imports System.Threading.Tasks
Public Class Class1
' Create a WebBrowser instance.
Private Event DocumentCompleted As WebBrowserDocumentCompletedEventHandler
Private ManufacturersURi As New Uri("https://www.example.com/Webpage.php3")
Public ManList As New List(Of TreeNode)
Public Sub GettHelpPage()
' Create a WebBrowser instance.
Dim webBrowserForPrinting As New WebBrowser() With {.ScriptErrorsSuppressed = True}
' Add an event handler that Scrape Data after it loads.
AddHandler webBrowserForPrinting.DocumentCompleted, New _
WebBrowserDocumentCompletedEventHandler(AddressOf GetManu_Name)
' Set the Url property to load the document.
webBrowserForPrinting.Url = ManufacturersURi
End Sub
Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
Dim webBrowserForPrinting As WebBrowser = CType(sender, WebBrowser)
Dim Divs = webBrowserForPrinting.Document.Body.GetElementsByTagName("Div")
' Scrape the document now that it is fully loaded.
Dim T As Task(Of List(Of TreeNode)) =
Task.Run(Function()
Dim LinksCount As Integer = 0
For Each Div As HtmlElement In Divs
If InStr(Div.GetAttribute("ClassName").ToString, "Div-Name", CompareMethod.Text) Then
LinksCount = Div.GetElementsByTagName("a").Count - 1
For I As Integer = 0 To LinksCount
Dim Txt() As String = Div.GetElementsByTagName("a").Item(I).InnerHtml.Split("<BR>")
Dim Manu_TreeNode As New TreeNode() With
{.Name = I.ToString, .Text = Txt(0)}
ManList.Add(Manu_TreeNode)
Next
End If
Next
Return ManList
End Function)
' Dispose the WebBrowser now that the task is complete.
Debug.WriteLine(T.Result.Count) 'Result is 116
webBrowserForPrinting.Dispose()
End Sub
以上代码得到116个TreeNodes,这是我抓取的Tags的个数。现在,当我尝试在 Form1_Load 上显示此结果时,没有任何反应,因为表单在代码完成执行之前加载。
这里是Form1_Load代码:
Public Class Form1
Dim ThisClass As New Class1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
ThisClass.GetHelpPage()
TreeView1.Nodes.Clear()
For I As Integer = 0 To ThisClass.ManList.Count - 1
TreeView1.Nodes.Add(ThisClass.ManList(I))
Next
End Sub
End Class
我注意到如果我在 Form1_Load 之前的某处放置一个空的 msgbox("") For..Next,它强制 Form1_Load 事件等待并成功填充 TreeView 控件。
What am I doing wrong ? or What am I missing there ?
试试这个
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim ThisClass As New Class1
Dim i As Integer = 0
Do Until ThisClass.IsCompleted
Threading.Thread.Sleep(100)
'if the document takes too much time
i += 1
If i > 30 Then Exit Do 'more than 3 sec
Loop
TreeView1.Nodes.Clear()
For I As Integer = 0 To ThisClass.ManList.Count - 1
TreeView1.Nodes.Add(ThisClass.ManList(I))
Next
End Sub
End Class
Class Class1
Dim Completed As Boolean = False
ReadOnly Property IsCompleted As Boolean
Get
Return Completed
End Get
End Property
Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
'your code
Completed = True
Return ManList
End Sub
I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.
是的,如果您将其保持打开足够长的时间直到 GetManu_Name
方法中的任务完成,它就会发挥 await
的作用。由于 MsgBox 是模态 window,它会阻止下一行的执行,直到它被关闭。
现在,您可以通过从 GetManu_Name
方法中删除 Task.Run(...)
使其成为一个完整的同步调用,或者以如下方式使用异步模式:
Public Class WebStuff
Public Shared Async Function ToTreeNodes(url As String) As Task(Of IEnumerable(Of TreeNode))
Dim tcsNavigated As New TaskCompletionSource(Of Boolean)
Dim tcsCompleted As New TaskCompletionSource(Of Boolean)
Dim nodes As New List(Of TreeNode)
Using wb As New WebBrowser With {.ScriptErrorsSuppressed = True}
AddHandler wb.Navigated,
Sub(s, e)
If tcsNavigated.Task.IsCompleted Then Return
tcsNavigated.SetResult(True)
End Sub
AddHandler wb.DocumentCompleted,
Sub(s, e)
If wb.ReadyState <> WebBrowserReadyState.Complete OrElse
tcsCompleted.Task.IsCompleted Then Return
tcsCompleted.SetResult(True)
End Sub
wb.Navigate(url)
Await tcsNavigated.Task
'Navigated.. if you need to do something here...
Await tcsCompleted.Task
'DocumentCompeleted.. Now we can process the Body...
Dim Divs = wb.Document.Body.GetElementsByTagName("Div")
Dim LinksCount As Integer = 0
For Each Div As HtmlElement In Divs
If Div.GetAttribute("ClassName").
IndexOf("Div-Name", StringComparison.InvariantCultureIgnoreCase) > -1 Then
LinksCount = Div.GetElementsByTagName("a").Count - 1
For I As Integer = 0 To LinksCount
Dim Txt = Div.GetElementsByTagName("a").Item(I).InnerHtml.
Split({"<BR>"}, StringSplitOptions.RemoveEmptyEntries)
Dim n As New TreeNode With {
.Name = I.ToString, .Text = Txt.FirstOrDefault
}
nodes.Add(n)
Next
End If
Next
End Using
Return nodes
End Function
End Class
方法注意事项:
- 单个
Async
函数来完成冗长的任务并 returns IEnumerable(Of TreeNode) 给调用者。 - Lambda Expressions are used to add the WebBrowser.Navigated and WebBrowser.DocumentCompleted 个事件。
- TaskCompletionSource is necessary here to wait for the completion of the WebBrowser.DocumentCompleted 事件,以便能够处理 HTML 内容。
需要在调用者签名中加上Async
修饰符才能调用函数并等待结果。例如,Form.Load
事件:
Private Async Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim nodes = Await WebStuff.ToTreeNodes("www....")
TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub
或Async
方法:
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
PopulateTree()
End Sub
Private Async Sub PopulateTree()
Dim nodes = Await WebStuff.ToTreeNodes("www....")
TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub