将 Selenium 与 Python 一起使用时是否可能有多个显式等待?
Is it possible to have multiple explicit waits when using Selenium with Python?
我对 Python 和 Selenium 还很陌生。
我的目标是自动执行谷歌搜索短语的过程,点击图片结果页面中的第一张图片,等待加载更大的图片,然后下载更大的图片并将其保存到本地目录。 (想法是保存比搜索结果中最初出现的图像质量更高的图像版本。)
这是我的代码,仅用于下载初始的“较小”图像。 (为简洁起见,我省略了所有导入等):
PATH = "/path/to/chromedriver"
save_folder = "../Album-Artwork"
seconds = [1, 2, 3, 4, 5]
if not os.path.exists(save_folder):
os.mkdir(save_folder)
driver = webdriver.Chrome(PATH)
search_terms = ["John Coltrane Blue Train Album Cover",
"The Silver Seas Chateau Revenge! Album Cover"]
count = 0
for term in search_terms:
driver.get("https://www.google.com/imghp?hl=en&ogbl")
# "q" is the name of the google search field input
search_bar = driver.find_element_by_name("q")
search_bar.send_keys(term)
search_bar.send_keys(Keys.RETURN)
try:
search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "islrg"))
)
# Gets all of the images on the page (it should be a list)
images = search_results.find_elements_by_tag_name("img")
# I just want the first result.
data_url = images[0].get_attribute('src')
# Read the dataURL and decode it to bytes
with urllib.request.urlopen(data_url) as response:
data = response.read()
with open(f"{save_folder}/{count}image.jpg", mode="wb") as f:
f.write(data)
# This will print if the above succeeds
print("Artwork Saved")
count += 1
sleep(random.choice(seconds))
except:
print("Error")
driver.quit()
driver.quit()
但是当我添加另一个“等待”以等待大图在单击后加载时,如我在此处编写的代码所示:
PATH = "/path/to/chromedriver"
save_folder = "../Album-Artwork"
seconds = [1, 2, 3, 4, 5]
if not os.path.exists(save_folder):
os.mkdir(save_folder)
driver = webdriver.Chrome(PATH)
search_terms = ["John Coltrane Blue Train Album Cover",
"The Silver Seas Chateau Revenge! Album Cover"]
count = 0
for term in search_terms:
driver.get("https://www.google.com/imghp?hl=en&ogbl")
search_bar = driver.find_element_by_name("q")
search_bar.send_keys(term)
search_bar.send_keys(Keys.RETURN)
try:
search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "islrg"))
)
images = search_results.find_elements_by_tag_name("img")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET BEGINS HERE ########
images[0].click()
# Wait for the larger image to load
new_search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "n3VNCb"))
)
large_image = new_search_results.find_element_by_class_name("n3VNCb")
source = large_image.get_attribute('src')
# Download and save the image
urllib.urlretrieve(source, f"{save_folder}/{count}image.jpg")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET ENDS HERE ########
print("Artwork Saved")
count += 1
sleep(random.choice(seconds))
except:
print("Error")
driver.quit()
driver.quit()
我收到这个错误:
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=50518): Max retries exceeded with url: /session/3bb2a509ad09817b8e786b2b1ebcecae/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x104b36880>: Failed to establish a new connection: [Errno 61] Connection refused'))
在做一些初步研究时,只要使用“睡眠”或其他类似方法来“减慢”Selenium 的快速进程,上述错误似乎就可以避免。我在这里多次使用 sleep,所以我不确定是不是这个问题。
似乎“较小”图像的“src”是数据 url,而较大图像的“src”是 url。不确定这是否与我面临的问题有关。
我会继续研究,但是这里有什么见解吗?
为了让这段代码工作,我必须删除正在创建的变量:
new_search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "n3VNCb"))
)
因为它似乎已经导致使用该 webdriver 等待找到相同元素的下一个变量的 urllib 错误。因此,我保留了使用过的驱动程序来查找更大的图像,然后将其传递给 urllib 请求以下载图像。请参阅下面的完整代码:
import urllib
import random
import os
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
save_folder = "/Users/name/Documents/"
seconds = [1, 2, 3, 4, 5]
if not os.path.exists(save_folder):
os.mkdir(save_folder)
optionsforchrome = Options()
optionsforchrome.add_argument('--no-sandbox')
optionsforchrome.add_argument('--start-maximized')
optionsforchrome.add_argument('--disable-extensions')
optionsforchrome.add_argument('--disable-dev-shm-usage')
optionsforchrome.add_argument('--ignore-certificate-errors')
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=optionsforchrome)
search_terms = ["John Coltrane Blue Train Album Cover",
"The Silver Seas Chateau Revenge! Album Cover"]
count = 0
for term in search_terms:
driver.get("https://www.google.com/imghp?hl=en&ogbl")
search_bar = driver.find_element(By.NAME, "q")
search_bar.send_keys(term)
search_bar.send_keys(Keys.RETURN)
try:
search_results = WebDriverWait(driver, 10).until(ec.presence_of_element_located((By.XPATH, '//a[@class="wXeWr islib nfEiy mM5pbd"]')))
images = search_results.find_elements(By.TAG_NAME, "img")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET BEGINS HERE ########
images[0].click()
# Wait for the larger image to load
WebDriverWait(driver, 10).until(ec.presence_of_element_located((By.CLASS_NAME, "n3VNCb")))
large_image = driver.find_element(By.CLASS_NAME, "n3VNCb")
source = large_image.get_attribute('src')
# Download and save the image
urllib.request.urlretrieve(source, f"{save_folder}/{count}image.jpg")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET ENDS HERE ########
print("Artwork Saved")
count += 1
time.sleep(random.choice(seconds))
except:
print("Error")
driver.quit()
driver.quit()
请注意,我正在为我的代码使用服务和选项对象以及 webdriver_manager 库。您可能需要更改这些以使您的代码正常工作。
我对 Python 和 Selenium 还很陌生。
我的目标是自动执行谷歌搜索短语的过程,点击图片结果页面中的第一张图片,等待加载更大的图片,然后下载更大的图片并将其保存到本地目录。 (想法是保存比搜索结果中最初出现的图像质量更高的图像版本。)
这是我的代码,仅用于下载初始的“较小”图像。 (为简洁起见,我省略了所有导入等):
PATH = "/path/to/chromedriver"
save_folder = "../Album-Artwork"
seconds = [1, 2, 3, 4, 5]
if not os.path.exists(save_folder):
os.mkdir(save_folder)
driver = webdriver.Chrome(PATH)
search_terms = ["John Coltrane Blue Train Album Cover",
"The Silver Seas Chateau Revenge! Album Cover"]
count = 0
for term in search_terms:
driver.get("https://www.google.com/imghp?hl=en&ogbl")
# "q" is the name of the google search field input
search_bar = driver.find_element_by_name("q")
search_bar.send_keys(term)
search_bar.send_keys(Keys.RETURN)
try:
search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "islrg"))
)
# Gets all of the images on the page (it should be a list)
images = search_results.find_elements_by_tag_name("img")
# I just want the first result.
data_url = images[0].get_attribute('src')
# Read the dataURL and decode it to bytes
with urllib.request.urlopen(data_url) as response:
data = response.read()
with open(f"{save_folder}/{count}image.jpg", mode="wb") as f:
f.write(data)
# This will print if the above succeeds
print("Artwork Saved")
count += 1
sleep(random.choice(seconds))
except:
print("Error")
driver.quit()
driver.quit()
但是当我添加另一个“等待”以等待大图在单击后加载时,如我在此处编写的代码所示:
PATH = "/path/to/chromedriver"
save_folder = "../Album-Artwork"
seconds = [1, 2, 3, 4, 5]
if not os.path.exists(save_folder):
os.mkdir(save_folder)
driver = webdriver.Chrome(PATH)
search_terms = ["John Coltrane Blue Train Album Cover",
"The Silver Seas Chateau Revenge! Album Cover"]
count = 0
for term in search_terms:
driver.get("https://www.google.com/imghp?hl=en&ogbl")
search_bar = driver.find_element_by_name("q")
search_bar.send_keys(term)
search_bar.send_keys(Keys.RETURN)
try:
search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "islrg"))
)
images = search_results.find_elements_by_tag_name("img")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET BEGINS HERE ########
images[0].click()
# Wait for the larger image to load
new_search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "n3VNCb"))
)
large_image = new_search_results.find_element_by_class_name("n3VNCb")
source = large_image.get_attribute('src')
# Download and save the image
urllib.urlretrieve(source, f"{save_folder}/{count}image.jpg")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET ENDS HERE ########
print("Artwork Saved")
count += 1
sleep(random.choice(seconds))
except:
print("Error")
driver.quit()
driver.quit()
我收到这个错误:
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=50518): Max retries exceeded with url: /session/3bb2a509ad09817b8e786b2b1ebcecae/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x104b36880>: Failed to establish a new connection: [Errno 61] Connection refused'))
在做一些初步研究时,只要使用“睡眠”或其他类似方法来“减慢”Selenium 的快速进程,上述错误似乎就可以避免。我在这里多次使用 sleep,所以我不确定是不是这个问题。
似乎“较小”图像的“src”是数据 url,而较大图像的“src”是 url。不确定这是否与我面临的问题有关。
我会继续研究,但是这里有什么见解吗?
为了让这段代码工作,我必须删除正在创建的变量:
new_search_results = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "n3VNCb"))
)
因为它似乎已经导致使用该 webdriver 等待找到相同元素的下一个变量的 urllib 错误。因此,我保留了使用过的驱动程序来查找更大的图像,然后将其传递给 urllib 请求以下载图像。请参阅下面的完整代码:
import urllib
import random
import os
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
save_folder = "/Users/name/Documents/"
seconds = [1, 2, 3, 4, 5]
if not os.path.exists(save_folder):
os.mkdir(save_folder)
optionsforchrome = Options()
optionsforchrome.add_argument('--no-sandbox')
optionsforchrome.add_argument('--start-maximized')
optionsforchrome.add_argument('--disable-extensions')
optionsforchrome.add_argument('--disable-dev-shm-usage')
optionsforchrome.add_argument('--ignore-certificate-errors')
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=optionsforchrome)
search_terms = ["John Coltrane Blue Train Album Cover",
"The Silver Seas Chateau Revenge! Album Cover"]
count = 0
for term in search_terms:
driver.get("https://www.google.com/imghp?hl=en&ogbl")
search_bar = driver.find_element(By.NAME, "q")
search_bar.send_keys(term)
search_bar.send_keys(Keys.RETURN)
try:
search_results = WebDriverWait(driver, 10).until(ec.presence_of_element_located((By.XPATH, '//a[@class="wXeWr islib nfEiy mM5pbd"]')))
images = search_results.find_elements(By.TAG_NAME, "img")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET BEGINS HERE ########
images[0].click()
# Wait for the larger image to load
WebDriverWait(driver, 10).until(ec.presence_of_element_located((By.CLASS_NAME, "n3VNCb")))
large_image = driver.find_element(By.CLASS_NAME, "n3VNCb")
source = large_image.get_attribute('src')
# Download and save the image
urllib.request.urlretrieve(source, f"{save_folder}/{count}image.jpg")
######## DIFFERENT CODE FROM PREVIOUS SNIPPET ENDS HERE ########
print("Artwork Saved")
count += 1
time.sleep(random.choice(seconds))
except:
print("Error")
driver.quit()
driver.quit()
请注意,我正在为我的代码使用服务和选项对象以及 webdriver_manager 库。您可能需要更改这些以使您的代码正常工作。