在页面中使用 selenium 进行迭代
Iterating with selenium through pages
这是我抓取的第一个网页,我发现的其他一些解决方案似乎不太有用。正如您将看到的,“下一步”按钮仍然可见,但是当您到达最后一页时,CSS 会发生一点变化。
一些注意事项。我正在使用 python、硒和 google chrome.
我正在尝试遍历此页面上 table 的每个部分:https://caearlyvoting.sos.ca.gov/
我已经想出如何遍历每个县,并获取我需要的信息(我认为)。但是,当 table 的记录多于默认显示的 10 条记录时,我对如何移动到下一页感到困惑。
我试过这个的变体
try:
next_page = driver.find_element_by_class_name('paginate_button')
next_page.click()
except NoSuchElementException:
pass
但运气不好。我试过以不同的方式获取元素,但我 运行 遇到了同样的问题。
谁能帮我弄清楚如何点击每一页,抓住我需要的东西,然后移动到下一个县?我不需要帮助从 table 获取信息,只需单击页面然后转到下一个县。
编辑
这是基于跟进的其余代码。我在构建它时遇到困难。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd
import time # not for production
# Name of the counties Single column with county names
county_df = pd.read_csv('Counties.csv')
# Path to driver on this computer
chrome_driver_path = r'C:\Windows\chromedriver'
# url to scrape
url = 'https://caearlyvoting.sos.ca.gov/'
with webdriver.Chrome(executable_path=chrome_driver_path)as driver:
# Open window, maximize and set an implicit wait
driver.get(url)
driver.maximize_window()
driver.implicitly_wait(10)
actions = ActionChains(driver) #* New line here from Whosebug
# find the county selection
county_selector = driver.find_element_by_id('CountyID')
# for loop tomove through the counties
for county in county_df['County'][:5]:
# Input the county namne
county_selector.send_keys(county)
### Code to grab data goes here
########* Code from Whosebug ########
while True:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
next_bnt_classes = next_page.get_attribute("class")
if "disabled" in next_bnt_classes:
break #last page reached, no more next pages, break the loop
else:
actions.move_to_element(next_page).perform()
time.sleep(0.5)
#get the actual next page button and click it
driver.find_element_by_css_selector(".paginate_button.next a").click()
您使用了错误的定位器。
下一页按钮也可以出现在页面底部的视图之外,因此您必须滚动到该元素,然后才能单击它。
在最后一页上,下一页按钮被禁用。
在这种情况下,它包含 disabled
class 名称。
所以你的代码可以是:
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
while True:
#grab the data from current page, after that:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
next_bnt_classes = next_page.get_attribute("class")
if "disabled" in next_bnt_classes:
break #last page reached, no more next pages, break the loop
else:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
actions.move_to_element(next_page).perform()
time.sleep(0.5)
#get the actual next page button and click it
driver.find_element_by_css_selector(".paginate_button.next a").click()
UPD
工作代码略有不同:
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
while True:
#grab the data from current page, after that:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
next_bnt_classes = next_page.get_attribute("class")
if next_bnt_classes == 'paginate_button next disabled':
break #last page reached, no more next pages, break the loop
else:
# Move to the next page for the county and append the data
next_page.click()
这是我抓取的第一个网页,我发现的其他一些解决方案似乎不太有用。正如您将看到的,“下一步”按钮仍然可见,但是当您到达最后一页时,CSS 会发生一点变化。
一些注意事项。我正在使用 python、硒和 google chrome.
我正在尝试遍历此页面上 table 的每个部分:https://caearlyvoting.sos.ca.gov/
我已经想出如何遍历每个县,并获取我需要的信息(我认为)。但是,当 table 的记录多于默认显示的 10 条记录时,我对如何移动到下一页感到困惑。
我试过这个的变体
try:
next_page = driver.find_element_by_class_name('paginate_button')
next_page.click()
except NoSuchElementException:
pass
但运气不好。我试过以不同的方式获取元素,但我 运行 遇到了同样的问题。
谁能帮我弄清楚如何点击每一页,抓住我需要的东西,然后移动到下一个县?我不需要帮助从 table 获取信息,只需单击页面然后转到下一个县。
编辑 这是基于跟进的其余代码。我在构建它时遇到困难。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd
import time # not for production
# Name of the counties Single column with county names
county_df = pd.read_csv('Counties.csv')
# Path to driver on this computer
chrome_driver_path = r'C:\Windows\chromedriver'
# url to scrape
url = 'https://caearlyvoting.sos.ca.gov/'
with webdriver.Chrome(executable_path=chrome_driver_path)as driver:
# Open window, maximize and set an implicit wait
driver.get(url)
driver.maximize_window()
driver.implicitly_wait(10)
actions = ActionChains(driver) #* New line here from Whosebug
# find the county selection
county_selector = driver.find_element_by_id('CountyID')
# for loop tomove through the counties
for county in county_df['County'][:5]:
# Input the county namne
county_selector.send_keys(county)
### Code to grab data goes here
########* Code from Whosebug ########
while True:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
next_bnt_classes = next_page.get_attribute("class")
if "disabled" in next_bnt_classes:
break #last page reached, no more next pages, break the loop
else:
actions.move_to_element(next_page).perform()
time.sleep(0.5)
#get the actual next page button and click it
driver.find_element_by_css_selector(".paginate_button.next a").click()
您使用了错误的定位器。
下一页按钮也可以出现在页面底部的视图之外,因此您必须滚动到该元素,然后才能单击它。
在最后一页上,下一页按钮被禁用。
在这种情况下,它包含 disabled
class 名称。
所以你的代码可以是:
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
while True:
#grab the data from current page, after that:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
next_bnt_classes = next_page.get_attribute("class")
if "disabled" in next_bnt_classes:
break #last page reached, no more next pages, break the loop
else:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
actions.move_to_element(next_page).perform()
time.sleep(0.5)
#get the actual next page button and click it
driver.find_element_by_css_selector(".paginate_button.next a").click()
UPD
工作代码略有不同:
from selenium.webdriver.common.action_chains import ActionChains
actions = ActionChains(driver)
while True:
#grab the data from current page, after that:
next_page = driver.find_element_by_css_selector(".paginate_button.next")
next_bnt_classes = next_page.get_attribute("class")
if next_bnt_classes == 'paginate_button next disabled':
break #last page reached, no more next pages, break the loop
else:
# Move to the next page for the county and append the data
next_page.click()