在 Selenium 中抓取网站时保存非重复图像
Saving non-repetetive images while crawling website in Selenium
网页示例为:link
目标是下载所有图像,但每个图像下载一次。
这是我使用的代码:
links = []
wait = WebDriverWait(driver, 5)
all_images = wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'swiper-button-next swiper-button-white')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
all_images = wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'swiper-slide swiper-slide-visible swiper-slide-active swiper-slide-thumb-active')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
all_images = wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'swiper-slide swiper-slide-visible')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
index = 1
for i in range(len(links)//2 + 1):
with open(title.replace(' ', '-') + str(index) + '.jpg', 'wb') as file:
im = requests.get(links[i])
file.write(im.content)
print('Saving image.. ', title + str(index))
index += 1
问题是,保存了重复的图片,有的不保存,不知道哪里出错了。
您使用了错误的定位器。
此外,presence_of_all_elements_located
不会等待所有元素,它会等待至少 1 个元素出现。
此外,元素的存在等待元素的存在,而这可能还不够。建议改用visibility_of_element_located
。
我认为以下代码会更好:
links = []
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'swiper-slide')]")))
time.sleep(0.5)
all_images = driver.find_elements_by_xpath("//div[contains(@class,'swiper-slide')]")
网页示例为:link
目标是下载所有图像,但每个图像下载一次。
这是我使用的代码:
links = []
wait = WebDriverWait(driver, 5)
all_images = wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'swiper-button-next swiper-button-white')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
all_images = wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'swiper-slide swiper-slide-visible swiper-slide-active swiper-slide-thumb-active')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
all_images = wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'swiper-slide swiper-slide-visible')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
index = 1
for i in range(len(links)//2 + 1):
with open(title.replace(' ', '-') + str(index) + '.jpg', 'wb') as file:
im = requests.get(links[i])
file.write(im.content)
print('Saving image.. ', title + str(index))
index += 1
问题是,保存了重复的图片,有的不保存,不知道哪里出错了。
您使用了错误的定位器。
此外,presence_of_all_elements_located
不会等待所有元素,它会等待至少 1 个元素出现。
此外,元素的存在等待元素的存在,而这可能还不够。建议改用visibility_of_element_located
。
我认为以下代码会更好:
links = []
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class,'swiper-slide')]")))
time.sleep(0.5)
all_images = driver.find_elements_by_xpath("//div[contains(@class,'swiper-slide')]")