while 语句继续循环,即使它在技术上不可行
While statement keeps on looping even though it should not be technically possible
我正在编写一个 Selenium Python 脚本,它应该从所有页面中抓取所有超链接,并使用它单击的“下一步”按钮在它们之间切换。这成功地抓取了所有链接,但是当它到达不应再存在“下一个”按钮元素的最后一页时,它会继续在最后一页上循环并不断地将抓取的数据一遍又一遍地写入 CSV 文件。
根据我对 while 和 try/except 语句的设置的理解,这在技术上应该是不可能的。已经弄乱代码几个小时了,因此我掉了头发,但我仍然没有设法修复它。
这是我试图从中抓取信息的网站:
https://www.sreality.cz/adresar
如您所见,公司名称为红色,底部有“下一步”箭头按钮。这是我应该抓取所有链接的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import WebDriverException, TimeoutException
from platform import system
from os import getcwd, getlogin
import csv
wait = WebDriverWait(driver, 10)
with open('links.csv', 'w+', newline='') as write:
driver.get("https://www.sreality.cz/adresar")
writer = csv.writer(write)
page_spawn = 0
while page_spawn == 0:
try:
links = wait.until(ec.presence_of_all_elements_located((By.CSS_SELECTOR, "h2.title > a")))
#print(len(links))
for link in links:
print(link.get_attribute("href"))
writer.writerow([link.get_attribute("href")])
wait.until(ec.element_to_be_clickable((By.CSS_SELECTOR, "a.btn-paging-pn.icof.icon-arr-right.paging-next"))).click()
except TimeoutException:
page_spawn = 1
break
您没有更改 try 块中 page_spawn
的值,这可能是循环 n 次的原因。
箭头按钮元素在 the last page 上仍然存在,但已禁用:
>> window.location
Location https://www.sreality.cz/adresar?strana=152
>> document.querySelector("a.btn-paging-pn.icof.icon-arr-right.paging-next")
<a class="btn-paging-pn icof icon-…ht paging-next disabled" ng-href="" ng-class="{disabled: !pagingData.nextUrl}">
在元素上调用 click()
方法什么都不做。
鉴于禁用元素具有 disabled
class 值,将 :not(.disabled)
添加到该选择器的末尾将阻止它匹配禁用元素:
>> window.location
Location https://www.sreality.cz/adresar?strana=152
>> document.querySelector("a.btn-paging-pn.icof.icon-arr-right.paging-next:not(.disabled)")
null
同时仍然与非禁用元素匹配:
>> window.location
Location https://www.sreality.cz/adresar?strana=151
>> document.querySelector("a.btn-paging-pn.icof.icon-arr-right.paging-next:not(.disabled)")
<a class="btn-paging-pn icof icon-arr-right paging-next" ng-href="/adresar?strana=152" ng-class="{disabled: !pagingData.nextUrl}" href="/adresar?strana=152">
我正在编写一个 Selenium Python 脚本,它应该从所有页面中抓取所有超链接,并使用它单击的“下一步”按钮在它们之间切换。这成功地抓取了所有链接,但是当它到达不应再存在“下一个”按钮元素的最后一页时,它会继续在最后一页上循环并不断地将抓取的数据一遍又一遍地写入 CSV 文件。
根据我对 while 和 try/except 语句的设置的理解,这在技术上应该是不可能的。已经弄乱代码几个小时了,因此我掉了头发,但我仍然没有设法修复它。
这是我试图从中抓取信息的网站: https://www.sreality.cz/adresar
如您所见,公司名称为红色,底部有“下一步”箭头按钮。这是我应该抓取所有链接的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import WebDriverException, TimeoutException
from platform import system
from os import getcwd, getlogin
import csv
wait = WebDriverWait(driver, 10)
with open('links.csv', 'w+', newline='') as write:
driver.get("https://www.sreality.cz/adresar")
writer = csv.writer(write)
page_spawn = 0
while page_spawn == 0:
try:
links = wait.until(ec.presence_of_all_elements_located((By.CSS_SELECTOR, "h2.title > a")))
#print(len(links))
for link in links:
print(link.get_attribute("href"))
writer.writerow([link.get_attribute("href")])
wait.until(ec.element_to_be_clickable((By.CSS_SELECTOR, "a.btn-paging-pn.icof.icon-arr-right.paging-next"))).click()
except TimeoutException:
page_spawn = 1
break
您没有更改 try 块中 page_spawn
的值,这可能是循环 n 次的原因。
箭头按钮元素在 the last page 上仍然存在,但已禁用:
>> window.location
Location https://www.sreality.cz/adresar?strana=152
>> document.querySelector("a.btn-paging-pn.icof.icon-arr-right.paging-next")
<a class="btn-paging-pn icof icon-…ht paging-next disabled" ng-href="" ng-class="{disabled: !pagingData.nextUrl}">
在元素上调用 click()
方法什么都不做。
鉴于禁用元素具有 disabled
class 值,将 :not(.disabled)
添加到该选择器的末尾将阻止它匹配禁用元素:
>> window.location
Location https://www.sreality.cz/adresar?strana=152
>> document.querySelector("a.btn-paging-pn.icof.icon-arr-right.paging-next:not(.disabled)")
null
同时仍然与非禁用元素匹配:
>> window.location
Location https://www.sreality.cz/adresar?strana=151
>> document.querySelector("a.btn-paging-pn.icof.icon-arr-right.paging-next:not(.disabled)")
<a class="btn-paging-pn icof icon-arr-right paging-next" ng-href="/adresar?strana=152" ng-class="{disabled: !pagingData.nextUrl}" href="/adresar?strana=152">