从 HttpWebRequest 下载时缺少某些元素?

Some Elements missing while downloading from HttpWebRequest?

我正在使用 httpwebrequest 从给定的 url 下载数据,但很少有元素没有响应。

   Dim Request As HttpWebRequest = CType(WebRequest.Create("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2"), HttpWebRequest)
    Request.Timeout = 2 * 60 * 1000
    Request.Proxy = Nothing
    Request.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip
    Request.Credentials = System.Net.CredentialCache.DefaultCredentials
    Dim HttpResp As HttpWebResponse
    HttpResp = (CType(Request.GetResponse(), HttpWebResponse))
    If HttpResp.StatusCode = HttpStatusCode.OK Then
        Dim receiveStream As Stream = HttpResp.GetResponseStream()
        Dim readStream As New StreamReader(receiveStream)
        Dim sData As String
        sData = readStream.ReadToEnd()
        readStream.Close()

    Else

    End If    

当我在 chrome 上打开 URL ( https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2 ) 并检查元素时,我可以看到这个元素(搜索 37005067200003B0F1FF2) 但作为回应,我没有得到这个元素(搜索 37005067200003B0F1FF2)。

使用网络浏览器控件的代码

Private Sub Button10_Click(sender As Object, e As EventArgs) Handles Button10.Click


    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
    Dim sURL As String = String.Format("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2")
    Dim webBrowserForPrinting As WebBrowser = New WebBrowser()
    webBrowserForPrinting.ScriptErrorsSuppressed = True
    AddHandler webBrowserForPrinting.DocumentCompleted, AddressOf PrintDocument
    webBrowserForPrinting.Url = New Uri(sURL)
    webBrowserForPrinting.Navigate(sURL)

End Sub
Private Sub PrintDocument(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
    Dim HTMD As HtmlDocument
    HTMD = CType(sender, WebBrowser).Document



    Dim HTC As HtmlElementCollection
    If HTMD IsNot Nothing Then
        HTC = HTMD.All
        For Each ele As HtmlElement In HTC
            MsgBox(ele.InnerHtml)

        Next
    End If

End Sub

您需要激活 WebBrowser 高级功能才能成功完成解析过程。如果未启用这些功能,则在标准 IE7 仿真中的 WebBrowser 将无法完成文档。失败是由大量脚本错误引起的。

我添加了一个 class 静态方法 (WebBrowserAdvancedFetures) 以将所需的值添加到注册表中。
WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures 在窗体的构造函数中被调用。
您可以回滚调用 WebBrowserAdvancedFetures.DeactivateWBAdvancedFeatures.

这个程序是如何工作的:

  1. 实例化一个WebBrowserclass(Private browser As WebBrowser)。我们也可以使用 WebBrowser 控件(Form 容器可以承载的可见控件版本),它是一回事。
  2. 订阅其 DocumentCompleted event. It will be raised each time one of the HtmlDocuments inside the main WebBrowser.Document is completed. Read 以获取有关 HtmlDocuments 嵌套的更多详细信息。
  3. DocumentCompleted 处理程序中,验证至少有一个文档已准备好进行解析,检查 WebBrowser.ReadyState = WebBrowserReadyState.Complete
  4. 当它是时,搜索包含我们要查找的数据的 HtmlElements。
  5. 收集完所有数据后,引发一个事件,通知解析已完成(这也允许其他 classes 的订阅者在需要时也得到通知。这需要自定义 EventArgs class,虽然)并禁用对 HtmlDocument 的进一步解析(这里,这是通过设置布尔字段完成的)。
  6. 处理新数据(这里只是一个String和一个DateTime对象),然后重置解析过程中使用的fields/variables。

记得在 Form.FormClosed 事件或自定义 class Dispose()[= 中删除处理程序55=]方法:

RemoveHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
RemoveHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted

Public Event DocumentParsingComplete As EventHandler(Of EventArgs)

Private browser As WebBrowser = Nothing
Private trackingNumberValue As String = String.Empty
Private trackingDateValue As DateTime
Private documentParsed As Boolean = False
Private userAgent As String = "User-Agent: Mozilla/5.0 (Windows NT 10; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0"

Public Sub New()
    InitializeComponent()
    WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures(Path.GetFileName(Application.ExecutablePath))
    browser = New WebBrowser With {.ScriptErrorsSuppressed = True}
    AddHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
    AddHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted
End Sub

Private Sub btnNavigate_Click(sender As Object, e As EventArgs) Handles btnNavigate.Click
    browser.Navigate("")
    browser.Document.OpenNew(True)
    documentParsed = False
    browser.Navigate("[Some URL]", "_self", Nothing, userAgent)
End Sub

Private Sub OnDocumentParsingComplete(sender As Object, e As EventArgs)
    ' Do whatever you need with these
    Console.WriteLine(trackingNumberValue)
    Console.WriteLine(trackingDateValue)

    'Then reset for further use
    trackingNumberValue = String.Empty
    trackingDateValue = DateTime.MinValue
End Sub

Private Sub browser_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
    Dim wb As WebBrowser = DirectCast(sender, WebBrowser)
    If wb.ReadyState <> WebBrowserReadyState.Complete OrElse wb.Document.Forms.Count = 0 OrElse documentParsed Then Return

    Dim trackingNumberClass As String = "tracking-number-value"
    Dim trackingElement = wb.Document.GetElementsByTagName("SPAN").
        OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Contains(trackingNumberClass))
    Me.trackingNumberValue = trackingElement?.InnerText

    Dim trackingDateClass As String = "ng-binding ng-scope"
    Dim trackingDateElement = wb.Document.GetElementsByTagName("SPAN").
        OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Equals(trackingDateClass))

    If trackingDateElement IsNot Nothing Then
        Dim deliveryDate As String = trackingDateElement.InnerText.Split().Last().TrimEnd("."c)
        Me.trackingDateValue = Date.ParseExact(deliveryDate, "dd-MM-yyyy", Nothing)
        If Not String.IsNullOrEmpty(trackingNumberValue) Then
            documentParsed = True
            RaiseEvent DocumentParsingComplete(sender, EventArgs.Empty)
        End If
    End If
End Sub

使用此 class 来 activate/deactivate WebBrowser 控件的高级功能:

Imports Microsoft.Win32
Imports System.Security.AccessControl

Public Class WebBrowserAdvancedFetures
    Private Shared baseKeyName As String = "Software\Microsoft\Internet Explorer\Main\FeatureControl"
    Private Shared featuresKey As String = baseKeyName & "\FEATURE_BROWSER_EMULATION"
    Private Shared hardwareAccelKey As String = baseKeyName & "\FEATURE_GPU_RENDERING"

    Public Shared Sub ActivateWBAdvancedFeatures(executableName As String)
        Dim wbFeatureKey As RegistryKey = Nothing
        Dim wbAccelKey As RegistryKey = Nothing

        Try
            wbFeatureKey = Registry.CurrentUser.OpenSubKey(featuresKey, 
                RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            If wbFeatureKey Is Nothing Then
                wbFeatureKey = Registry.CurrentUser.CreateSubKey(featuresKey, True)
            End If
            wbFeatureKey.SetValue(executableName, 11001, RegistryValueKind.DWord)

            wbAccelKey = Registry.CurrentUser.OpenSubKey(hardwareAccelKey, 
                RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            If wbAccelKey Is Nothing Then
                wbAccelKey = Registry.CurrentUser.CreateSubKey(hardwareAccelKey, True)
            End If
            wbAccelKey.SetValue(executableName, 1, RegistryValueKind.DWord)
        Finally
            wbFeatureKey?.Dispose()
            wbAccelKey?.Dispose()
        End Try
    End Sub

    Public Shared Sub DeactivateWBAdvancedFeatures(executableName As String)
        Using wbFeatureKey = Registry.CurrentUser.OpenSubKey(
            featuresKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            wbFeatureKey.DeleteValue(executableName, False)
        End Using

        Using wbAccelKey = Registry.CurrentUser.OpenSubKey(
            hardwareAccelKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            wbAccelKey.DeleteValue(executableName, False)
        End Using
    End Sub
End Class