在页面中使用 selenium 进行迭代

Iterating with selenium through pages

这是我抓取的第一个网页,我发现的其他一些解决方案似乎不太有用。正如您将看到的,“下一步”按钮仍然可见,但是当您到达最后一页时,CSS 会发生一点变化。

一些注意事项。我正在使用 python、硒和 google chrome.

我正在尝试遍历此页面上 table 的每个部分:https://caearlyvoting.sos.ca.gov/

我已经想出如何遍历每个县,并获取我需要的信息(我认为)。但是,当 table 的记录多于默认显示的 10 条记录时,我对如何移动到下一页感到困惑。

我试过这个的变体

  try:
        next_page = driver.find_element_by_class_name('paginate_button')
        next_page.click()
    except NoSuchElementException:
        pass

但运气不好。我试过以不同的方式获取元素,但我 运行 遇到了同样的问题。

谁能帮我弄清楚如何点击每一页,抓住我需要的东西,然后移动到下一个县?我不需要帮助从 table 获取信息,只需单击页面然后转到下一个县。

编辑 这是基于跟进的其余代码。我在构建它时遇到困难。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd
import time # not for production

# Name of the counties Single column with county names
county_df = pd.read_csv('Counties.csv')

# Path to driver on this computer
chrome_driver_path = r'C:\Windows\chromedriver'

# url to scrape
url = 'https://caearlyvoting.sos.ca.gov/'

with webdriver.Chrome(executable_path=chrome_driver_path)as driver:
    # Open window, maximize and set an implicit wait
    driver.get(url)
    driver.maximize_window()
    driver.implicitly_wait(10)
    actions = ActionChains(driver) #* New line here from Whosebug
    # find the county selection
    county_selector = driver.find_element_by_id('CountyID')
    # for loop tomove through the counties
    for county in county_df['County'][:5]:
        # Input the county namne
        county_selector.send_keys(county)
        ### Code to grab data goes here
        
        ########* Code from Whosebug ########
        while True:
            next_page = driver.find_element_by_css_selector(".paginate_button.next")
            next_bnt_classes = next_page.get_attribute("class")
            if "disabled" in next_bnt_classes:
                break  #last page reached, no more next pages, break the loop
            else:
                actions.move_to_element(next_page).perform()
                time.sleep(0.5)
                #get the actual next page button and click it
                driver.find_element_by_css_selector(".paginate_button.next a").click()

您使用了错误的定位器。
下一页按钮也可以出现在页面底部的视图之外,因此您必须滚动到该元素,然后才能单击它。
在最后一页上,下一页按钮被禁用。
在这种情况下,它包含 disabled class 名称。
所以你的代码可以是:

from selenium.webdriver.common.action_chains import ActionChains

actions = ActionChains(driver)

while True:
    #grab the data from current page, after that:
    next_page = driver.find_element_by_css_selector(".paginate_button.next")
    next_bnt_classes = next_page.get_attribute("class")
    if "disabled" in next_bnt_classes:
        break  #last page reached, no more next pages, break the loop
    else:
        next_page = driver.find_element_by_css_selector(".paginate_button.next")
        actions.move_to_element(next_page).perform()
        time.sleep(0.5)
        #get the actual next page button and click it
        driver.find_element_by_css_selector(".paginate_button.next a").click()

UPD
工作代码略有不同:

from selenium.webdriver.common.action_chains import ActionChains

actions = ActionChains(driver)

while True:
    #grab the data from current page, after that:
    next_page = driver.find_element_by_css_selector(".paginate_button.next")
    next_bnt_classes = next_page.get_attribute("class")
    if next_bnt_classes == 'paginate_button next disabled':
        break  #last page reached, no more next pages, break the loop
    else:
        # Move to the next page for the county and append the data              
        next_page.click()