使用bs4和selenium如何在爬行时转到下一页?

How to go to the next page while crawling using bs4 & selenium?

所以我想做的是通过抓取网站并下载汽车图像然后标记它们来生成车牌数据集。 这是我正在使用的代码:

def image_downloader(url, folder):
    try:
        os.mkdir(os.path.join(os.getcwd(), folder))
    except:
        pass
    os.chdir(os.path.join(os.getcwd(), folder))
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    images = soup.find_all('img')
    count = 1
    for image in images:
        link = image['src']
        if link.find('https://iranpelak.com/'):
            pass
        else:
            with open(str(count) + '.jpg', 'wb') as f:
                im = requests.get(link)
                f.write(im.content)
                print('Writing.. ', str(count))
                count += 1

iranpelak_url = 'https://iranpelak.com/car-search'
image_downloader(iranpelak_url, 'car_images')


page_number = 2
while page_number != 10000:
    driver = webdriver.Chrome(r"./chromedriver")
    driver.get("https://iranpelak.com/car-search")
    driver.maximize_window()
    time.sleep(5)
    button = driver.find_element_by_link_text(str(page_number))
    button.click()
    time.sleep(10)
    image_downloader(iranpelak_url, 'car_images')
    page_number += 1

重点是,自动转到下一页并下载所有图像。 它不起作用,我不知道我做错了什么。 如有任何帮助,我们将不胜感激!

据观察,当网站打开时,有一个 Alert 弹出窗口 Search Page

您需要关闭 alert window 才能点击其他 Elements

使用下面的代码,它确实点击了没有 ElementClickInterceptedException 的页码。在 try-except 块中处理该弹出窗口,以防没有 alert 弹出窗口。

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time

page_number = 2
while page_number != 10000:
    driver = webdriver.Chrome(executable_path='path to chromedriver.exe')
    driver.get("https://iranpelak.com/car-search")
    driver.maximize_window()
    time.sleep(5)
    try:
        WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[@data-role='end']"))).click()
    except Exception as e:
        print(e)
        pass
    button = driver.find_element_by_link_text(str(page_number))
    button.click()
    time.sleep(10)
    # image_downloader(iranpelak_url, 'car_images')
    page_number += 1