硒和旋转容器
Selenium and revolving containers
有一个页面带有 table 和一个刷新 table 的下一个按钮。我现在可以提取 table 的内容,但需要使用下一步按钮移至其他行。这是某种 ajax table,没有 href 来刷新页面。因此我被困住了。页面是 https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6335/Stages/13796/PlayerStatistics/England-Premier-League-2016-2017.
我会执行以下操作:
- 开始无限循环
- 单击下一步按钮 - 如果失败 - 退出循环(这是你的 "break" 条件)
- 等待table加载包装器
不可见
- 收集玩家数据
实施示例(仅使用 selenium
,但您可能应该涉及 BeautifulSoup
以进行玩家数据解析 - 应该会快得多):
from pprint import pprint
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import ElementNotVisibleException
root = "https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6335/Stages/13796/PlayerStatistics/England-Premier-League-2016-2017"
driver = webdriver.PhantomJS()
driver.get(root)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#statistics-table-summary .player-link")))
# get the first 10 players
players = [player.text for player in driver.find_elements_by_css_selector("#statistics-table-summary .player-link")]
while True:
try:
# click Next
driver.find_element_by_link_text("next").click()
except ElementNotVisibleException:
break # next is not present/visible
wait.until(EC.invisibility_of_element_located((By.ID, "statistics-table-summary-loading")))
# collect the next 10 players
players += [player.text for player in driver.find_elements_by_css_selector("#statistics-table-summary .player-link")]
print(len(players))
pprint(players)
driver.close()
请注意,就解析而言,为了提高性能,请使用 SoupStrainer
仅解析相关的 table。
有一个页面带有 table 和一个刷新 table 的下一个按钮。我现在可以提取 table 的内容,但需要使用下一步按钮移至其他行。这是某种 ajax table,没有 href 来刷新页面。因此我被困住了。页面是 https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6335/Stages/13796/PlayerStatistics/England-Premier-League-2016-2017.
我会执行以下操作:
- 开始无限循环
- 单击下一步按钮 - 如果失败 - 退出循环(这是你的 "break" 条件)
- 等待table加载包装器 不可见
- 收集玩家数据
实施示例(仅使用 selenium
,但您可能应该涉及 BeautifulSoup
以进行玩家数据解析 - 应该会快得多):
from pprint import pprint
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import ElementNotVisibleException
root = "https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6335/Stages/13796/PlayerStatistics/England-Premier-League-2016-2017"
driver = webdriver.PhantomJS()
driver.get(root)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#statistics-table-summary .player-link")))
# get the first 10 players
players = [player.text for player in driver.find_elements_by_css_selector("#statistics-table-summary .player-link")]
while True:
try:
# click Next
driver.find_element_by_link_text("next").click()
except ElementNotVisibleException:
break # next is not present/visible
wait.until(EC.invisibility_of_element_located((By.ID, "statistics-table-summary-loading")))
# collect the next 10 players
players += [player.text for player in driver.find_elements_by_css_selector("#statistics-table-summary .player-link")]
print(len(players))
pprint(players)
driver.close()
请注意,就解析而言,为了提高性能,请使用 SoupStrainer
仅解析相关的 table。