使用bs4和selenium如何在爬行时转到下一页?
How to go to the next page while crawling using bs4 & selenium?
所以我想做的是通过抓取网站并下载汽车图像然后标记它们来生成车牌数据集。
这是我正在使用的代码:
def image_downloader(url, folder):
try:
os.mkdir(os.path.join(os.getcwd(), folder))
except:
pass
os.chdir(os.path.join(os.getcwd(), folder))
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.find_all('img')
count = 1
for image in images:
link = image['src']
if link.find('https://iranpelak.com/'):
pass
else:
with open(str(count) + '.jpg', 'wb') as f:
im = requests.get(link)
f.write(im.content)
print('Writing.. ', str(count))
count += 1
iranpelak_url = 'https://iranpelak.com/car-search'
image_downloader(iranpelak_url, 'car_images')
page_number = 2
while page_number != 10000:
driver = webdriver.Chrome(r"./chromedriver")
driver.get("https://iranpelak.com/car-search")
driver.maximize_window()
time.sleep(5)
button = driver.find_element_by_link_text(str(page_number))
button.click()
time.sleep(10)
image_downloader(iranpelak_url, 'car_images')
page_number += 1
重点是,自动转到下一页并下载所有图像。
它不起作用,我不知道我做错了什么。
如有任何帮助,我们将不胜感激!
据观察,当网站打开时,有一个 Alert
弹出窗口 Search Page
。
您需要关闭 alert
window 才能点击其他 Elements
。
使用下面的代码,它确实点击了没有 ElementClickInterceptedException
的页码。在 try-except
块中处理该弹出窗口,以防没有 alert
弹出窗口。
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
page_number = 2
while page_number != 10000:
driver = webdriver.Chrome(executable_path='path to chromedriver.exe')
driver.get("https://iranpelak.com/car-search")
driver.maximize_window()
time.sleep(5)
try:
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[@data-role='end']"))).click()
except Exception as e:
print(e)
pass
button = driver.find_element_by_link_text(str(page_number))
button.click()
time.sleep(10)
# image_downloader(iranpelak_url, 'car_images')
page_number += 1
所以我想做的是通过抓取网站并下载汽车图像然后标记它们来生成车牌数据集。 这是我正在使用的代码:
def image_downloader(url, folder):
try:
os.mkdir(os.path.join(os.getcwd(), folder))
except:
pass
os.chdir(os.path.join(os.getcwd(), folder))
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.find_all('img')
count = 1
for image in images:
link = image['src']
if link.find('https://iranpelak.com/'):
pass
else:
with open(str(count) + '.jpg', 'wb') as f:
im = requests.get(link)
f.write(im.content)
print('Writing.. ', str(count))
count += 1
iranpelak_url = 'https://iranpelak.com/car-search'
image_downloader(iranpelak_url, 'car_images')
page_number = 2
while page_number != 10000:
driver = webdriver.Chrome(r"./chromedriver")
driver.get("https://iranpelak.com/car-search")
driver.maximize_window()
time.sleep(5)
button = driver.find_element_by_link_text(str(page_number))
button.click()
time.sleep(10)
image_downloader(iranpelak_url, 'car_images')
page_number += 1
重点是,自动转到下一页并下载所有图像。 它不起作用,我不知道我做错了什么。 如有任何帮助,我们将不胜感激!
据观察,当网站打开时,有一个 Alert
弹出窗口 Search Page
。
您需要关闭 alert
window 才能点击其他 Elements
。
使用下面的代码,它确实点击了没有 ElementClickInterceptedException
的页码。在 try-except
块中处理该弹出窗口,以防没有 alert
弹出窗口。
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
page_number = 2
while page_number != 10000:
driver = webdriver.Chrome(executable_path='path to chromedriver.exe')
driver.get("https://iranpelak.com/car-search")
driver.maximize_window()
time.sleep(5)
try:
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[@data-role='end']"))).click()
except Exception as e:
print(e)
pass
button = driver.find_element_by_link_text(str(page_number))
button.click()
time.sleep(10)
# image_downloader(iranpelak_url, 'car_images')
page_number += 1