网页导航返回第 1 页

Web page navigation reverting back to page 1

我正在提取一些黄页数据,效果很好。但是我的问题是围绕页面导航。尽管它在尝试导航到第 3 页时从第 1 页导航到第 2 页时导航良好,但我的代码返回到第 1 页并再次提取数据。数据提取很好问题是导航。

YellowPage.ca

这是我确定的,我认为是问题所在,但不知道如何解决。

当页面导航到第 2 页时,'emptyPageButton' 的 class 更改为相同的 class 以导航到下一页,因此不会前进到下一页页,也就是第 3 页,它会返回到第 1 页。如果我声明应该提取 10 页,它将提取每页 1 + 2,每页五次,因为它会在两页之间来回移动。

我试了几次,都不行。我可以到达第 2 页,然后返回到第 1 页

WITH CLASS 工作到第 2 页然后返回到第 1 页

''' Searches Number of Pages entered in Sheet20 rage J9

    If pageNumber >= Replace(Worksheets("Sheet20").Range("J9").Value, "", "+") Then Exit Do
       Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton ")(0)
       'Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton ")(1)
       'Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton ")(0).children (0)
       'Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton ")(1).children (0)
       'Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton ")(1).children (1)
       'Set nextPageElement = HTML.getElementsByClassName("view_more_section_noScroll ")(0).getElementsByTagName("a")(1)
          If nextPageElement Is Nothing Then Exit Do
             nextPageElement.Click 'next web page
             Application.Wait Now + TimeValue("00:00:05")
    

WITH QUERY SELECTOR 工作到第 2 页然后返回到第 1 页

''' Searches Number of Pages entered in Sheet20 rage J9

    If pageNumber >= Replace(Worksheets("Sheet20").Range("J9").Value, "", "+") Then Exit Do
       Set nextPageElement = HTML.querySelector(".view_more_section_noScroll .pageButton")
          If Not nextPageElement Is Nothing Then
             nextPageElement.Click
             Application.Wait Now + TimeValue("00:00:05")
          Else:
             Exit Do
         End If

第 1 页的代码段

<div class="view_more_section_noScroll">
  <div class="emptyPageButton"></div>
  <span class="pageCount">
<span class="bold">
1 /
</span>
  <span class="">
37</span>
  </span>
  <a href="/search/si/2/car+dealership/Toronto+ON" data-analytics="{&quot;event_name&quot;:&quot;click - load_more - Serp &quot;,&quot;lk_se_id&quot;:&quot;f32f0ee7-8492-46dd-87da-7b621c162879_Y2FyIGRlYWxlcnNoaXA_VG9yb250byBPTg&quot;,&quot;lk_name&quot;:&quot;next_serp&quot;}"
    class="ypbtn btn-theme pageButton">Next
&gt;&gt;</a>
</div>

第 2 页及以后的代码段

<div class="view_more_section_noScroll">
  <a href="/search/si/1/car+dealership/Toronto+ON" data-analytics="{&quot;event_name&quot;:&quot;click - previous_page - Serp &quot;,&quot;lk_se_id&quot;:&quot;f32f0ee7-8492-46dd-87da-7b621c162879_Y2FyIGRlYWxlcnNoaXA_VG9yb250byBPTg&quot;,&quot;lk_name&quot;:&quot;previous_serp&quot;}"
    class="ypbtn btn-theme pageButton">&lt;&lt; Previous</a>
  <span class="pageCount">
<span class="bold">
2 /
</span>
  <span class="">
37</span>
  </span>
  <a href="/search/si/3/car+dealership/Toronto+ON" data-analytics="{&quot;event_name&quot;:&quot;click - load_more - Serp &quot;,&quot;lk_se_id&quot;:&quot;f32f0ee7-8492-46dd-87da-7b621c162879_Y2FyIGRlYWxlcnNoaXA_VG9yb250byBPTg&quot;,&quot;lk_name&quot;:&quot;next_serp&quot;}"
    class="ypbtn btn-theme pageButton">Next
&gt;&gt;</a>
</div>

问题,有人可以告诉我导航的正确 class 或 querySelector 是什么吗?

结果

提前致谢。

'''########################## 于 2021 年 8 月 4 日更新######### ############

完整的代码很大,我已经减少了很多代码以使其更易于阅读,因为唯一的问题是页面导航。这段代码应该让您了解我正在尝试做什么。目前它覆盖了之前提取的结果,我错误地删除了代码中的一些内容,请暂时忽略它,因为只有页面导航是一个问题

Private Sub YellowPagesCa()

Dim HTML As htmlDocument
Dim objIE As Object
Dim result As String 'string variable that will hold our result link
Dim pageNumber As Long ' page no.
Dim nextPageElement As Object 'page element
Dim HtmlText As Variant ' for html data
Dim wsSheet As Worksheet ' WorkSheet
Dim wb As Workbook
Dim sht As Worksheet

        Set wb = ThisWorkbook
            Set wsSheet = wb.Sheets("YellowPages")
             Set sht = ThisWorkbook.Worksheets("YellowPages")
              
'+++++ Internet Explorer ++++++
        Set objIE = New InternetExplorer 'initiating a new instance of Internet Explorer and asigning it to objIE
        objIE.Visible = True
            objIE.navigate "https://www.yellowpages.ca/search/si/1/car+dealer/Toronto+ON"
            
        Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop 'wait here a few seconds while the browser is busy
        
        Set HTML = objIE.document
        Set elements = HTML.getElementsByClassName("listing_right_section")

    For Each element In elements
            DoEvents
''' Element 1
        If element.getElementsByClassName("listing__name--link listing__link jsListingName")(0) Is Nothing Then
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
        Else
            HtmlText = element.getElementsByClassName("listing__name--link listing__link jsListingName")(0).href
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
        End If
         
    'End If
Next element

    Do

'''###############      PAGE NAVIGATION    ##############

    'Searches Number of Pages entered in
    If pageNumber >= 5 Then Exit Do 'Replace(Worksheets("Sheet20").Range("J9").Value, "", "+") Then Exit Do

    Set nextPageElement = HTML.querySelector(".view_more_section_noScroll .pageButton")
   ' Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton ")(0)
        If Not nextPageElement Is Nothing Then
           nextPageElement.Click
            Application.Wait Now + TimeValue("00:00:05")
        Else:
            Exit Do
        End If

    Do While objIE.Busy = True Or objIE.readyState <> 4
    DoEvents
    Loop
        Set HTML = objIE.document
        pageNumber = pageNumber + 1
  Loop
                
        objIE.Quit ' end and clear browser
            Set objIE = Nothing
            Set HTML = Nothing
            Set nextPageElement = Nothing
            Set HtmlText = Nothing
            Set element = Nothing
        Complete.show
   'End If
  
End Sub

你可以循环 while

ie.document.querySelectorAll(".pageCount + a").Length <> 0

单击该循环内的 next 按钮:

ie.document.querySelector(".pageCount + a").click

ie.Navigate2 ie.document.querySelector(".pageCount + a").href

当没有更多 next 按钮时,这将终止。


或者,从第一页提取页数并循环到该页数,将当前页码代入 url(例如,将 1 替换为 2 以获得第 2 页)

Option Explicit

Public Sub PrintSomeInfo()

    Dim ie As SHDocVw.InternetExplorer, re As Object

    Set ie = New SHDocVw.InternetExplorer
    Set re = CreateObject("VBScript.RegExp")
    
    With re
        .Global = False
        .MultiLine = False
        .Pattern = "(si\/)(\d+)(\/)"
    End With
    
    With ie
    
        .Visible = True
        
        .Navigate2 "https://www.yellowpages.ca/search/si/1/car+dealership/Toronto+ON"
        
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        Dim pageCount As Long, i As Long
        
        pageCount = CLng(.document.querySelector(".pageCount .bold + span").innerText)
        
        'already on page one so just loop from 2 to pageCount
        For i = 2 To pageCount
             
            .Navigate2 re.Replace(.document.url, "" & CStr(i) & "")
            
            While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
            
            'do something with new page
        Next
        
        Stop
       
        .Quit
    End With

End Sub

正则表达式:

正则表达式模式匹配 url 中的 3 个组,然后用新页码替换第二组,即当前页码:

感谢 QHarr 的回答,我能够通过使用它的一部分来解决问题。我已经将我的 Class 和 QuerySelector 代码与部分 QHarr QuerySelector 答案一起使用。我现在可以正常浏览页面了。

Do
' Searches Number of Pages entered in Sheet20 J9
    If pageNumber >= Replace(Worksheets("Sheet20").Range("J9").Value, "", "+") Then Exit Do
        'Set nextPageElement = HTML.querySelector(".view_more_section_noScroll .pageButton")
        Set nextPageElement = HTML.getElementsByClassName("ypbtn btn-theme pageButton")(0) '' using class and NOT QuerySelector here
        If Not nextPageElement Is Nothing Then
         nextPageElement.document.querySelector(".pageCount + a").Click ''NEW PART
            Application.Wait Now + TimeValue("00:00:05")
        Else:
            Exit Do
        End If