将 Selenium 与 Python 一起使用时是否可能有多个显式等待?

Is it possible to have multiple explicit waits when using Selenium with Python?

我对 Python 和 Selenium 还很陌生。

我的目标是自动执行谷歌搜索短语的过程,点击图片结果页面中的第一张图片,等待加载更大的图片,然后下载更大的图片并将其保存到本地目录。 (想法是保存比搜索结果中最初出现的图像质量更高的图像版本。)

这是我的代码,仅用于下载初始的“较小”图像。 (为简洁起见,我省略了所有导入等):

PATH = "/path/to/chromedriver"

save_folder = "../Album-Artwork"
seconds = [1, 2, 3, 4, 5]

if not os.path.exists(save_folder):
    os.mkdir(save_folder)

driver = webdriver.Chrome(PATH)

search_terms = ["John Coltrane Blue Train Album Cover",
                "The Silver Seas Chateau Revenge! Album Cover"]

count = 0

for term in search_terms:

    driver.get("https://www.google.com/imghp?hl=en&ogbl")

    # "q" is the name of the google search field input
    search_bar = driver.find_element_by_name("q")

    search_bar.send_keys(term)
    search_bar.send_keys(Keys.RETURN)

    try:
        search_results = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "islrg"))
        )

        # Gets all of the images on the page (it should be a list)
        images = search_results.find_elements_by_tag_name("img")

        # I just want the first result.
        data_url = images[0].get_attribute('src')

        # Read the dataURL and decode it to bytes
        with urllib.request.urlopen(data_url) as response:
            data = response.read()
            with open(f"{save_folder}/{count}image.jpg", mode="wb") as f:
                f.write(data)

        # This will print if the above succeeds
        print("Artwork Saved")

        count += 1
        sleep(random.choice(seconds))

    except:
        print("Error")
        driver.quit()

driver.quit()

但是当我添加另一个“等待”以等待大图在单击后加载时,如我在此处编写的代码所示:


PATH = "/path/to/chromedriver"

save_folder = "../Album-Artwork"
seconds = [1, 2, 3, 4, 5]

if not os.path.exists(save_folder):
    os.mkdir(save_folder)

driver = webdriver.Chrome(PATH)

search_terms = ["John Coltrane Blue Train Album Cover",
                "The Silver Seas Chateau Revenge! Album Cover"]

count = 0

for term in search_terms:

    driver.get("https://www.google.com/imghp?hl=en&ogbl")
    search_bar = driver.find_element_by_name("q")
    search_bar.send_keys(term)
    search_bar.send_keys(Keys.RETURN)

    try:

        search_results = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "islrg"))
        )

        images = search_results.find_elements_by_tag_name("img")

######## DIFFERENT CODE FROM PREVIOUS SNIPPET BEGINS HERE ########

        images[0].click()
        
        # Wait for the larger image to load
        new_search_results = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "n3VNCb"))
        )

        large_image = new_search_results.find_element_by_class_name("n3VNCb")

        source = large_image.get_attribute('src')

        # Download and save the image
        urllib.urlretrieve(source, f"{save_folder}/{count}image.jpg")

######## DIFFERENT CODE FROM PREVIOUS SNIPPET ENDS HERE ########

        print("Artwork Saved")

        count += 1
        sleep(random.choice(seconds))

    except:

        print("Error")
        driver.quit()

driver.quit()

我收到这个错误:

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=50518): Max retries exceeded with url: /session/3bb2a509ad09817b8e786b2b1ebcecae/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x104b36880>: Failed to establish a new connection: [Errno 61] Connection refused'))

在做一些初步研究时,只要使用“睡眠”或其他类似方法来“减慢”Selenium 的快速进程,上述错误似乎就可以避免。我在这里多次使用 sleep,所以我不确定是不是这个问题。

似乎“较小”图像的“src”是数据 url,而较大图像的“src”是 url。不确定这是否与我面临的问题有关。

我会继续研究,但是这里有什么见解吗?

为了让这段代码工作,我必须删除正在创建的变量:

        new_search_results = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "n3VNCb"))
        )

因为它似乎已经导致使用该 webdriver 等待找到相同元素的下一个变量的 urllib 错误。因此,我保留了使用过的驱动程序来查找更大的图像,然后将其传递给 urllib 请求以下载图像。请参阅下面的完整代码:

import urllib
import random
import os
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys

save_folder = "/Users/name/Documents/"
seconds = [1, 2, 3, 4, 5]

if not os.path.exists(save_folder):
    os.mkdir(save_folder)

optionsforchrome = Options()
optionsforchrome.add_argument('--no-sandbox')
optionsforchrome.add_argument('--start-maximized')
optionsforchrome.add_argument('--disable-extensions')
optionsforchrome.add_argument('--disable-dev-shm-usage')
optionsforchrome.add_argument('--ignore-certificate-errors')
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=optionsforchrome)

search_terms = ["John Coltrane Blue Train Album Cover",
                "The Silver Seas Chateau Revenge! Album Cover"]

count = 0

for term in search_terms:
    driver.get("https://www.google.com/imghp?hl=en&ogbl")
    search_bar = driver.find_element(By.NAME, "q")
    search_bar.send_keys(term)
    search_bar.send_keys(Keys.RETURN)
    try:
        search_results = WebDriverWait(driver, 10).until(ec.presence_of_element_located((By.XPATH, '//a[@class="wXeWr islib nfEiy mM5pbd"]')))
        images = search_results.find_elements(By.TAG_NAME, "img")
        ######## DIFFERENT CODE FROM PREVIOUS SNIPPET BEGINS HERE ########
        images[0].click()
        # Wait for the larger image to load
        WebDriverWait(driver, 10).until(ec.presence_of_element_located((By.CLASS_NAME, "n3VNCb")))
        large_image = driver.find_element(By.CLASS_NAME, "n3VNCb")
        source = large_image.get_attribute('src')
        # Download and save the image
        urllib.request.urlretrieve(source, f"{save_folder}/{count}image.jpg")
        ######## DIFFERENT CODE FROM PREVIOUS SNIPPET ENDS HERE ########
        print("Artwork Saved")
        count += 1
        time.sleep(random.choice(seconds))
    except:
        print("Error")
        driver.quit()

driver.quit()

请注意,我正在为我的代码使用服务和选项对象以及 webdriver_manager 库。您可能需要更改这些以使您的代码正常工作。