Python for循环跳过迭代
Python for loop skips iteration
所以我制作了一个 selenium 机器人,它遍历区域代码列表并将此代码发送到网站的搜索框,该网站将代码更改为城市名称,然后我将其抓取以获取列表城市代替代码列表。问题是,当我的 for 循环遍历列表时,有时它会“跳过”给出的命令并直接进入下一次迭代,因此我没有收到完整的城市列表。列表中的某些代码不存在或不适合传递到网站中,因此我对这些情况进行了例外处理。
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
cities = []
iteration = 0
for code in codes:
time.sleep(0.05)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
# Search
try:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
except:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
# Scrape city name
city = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
这是我的控制台日志的一部分。如您所见,在指示迭代次数为 98 后,它跳到 99,在那里它完全正常工作,打印城市和地区代码。这个问题发生在循环的更深处,但每次它都从第 98 次迭代开始。与此相关的地区代码不是例外之一。
96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']
**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**
这里有几个问题:
- 你的定位器太糟糕了。
- 我看到你的结果不正确。例如,对于“2262011”输入,输出是“Gdynia (2262011)”,而您为输入“2205084”呈现此输出
- 您的 except 代码与 try 代码相似。这没有意义。如果这在 try 块中不起作用,为什么您认为这将在第二次尝试时起作用而无需任何更改?
- 还最好等待元素可见性而不是存在,因为在元素刚出现的那一刻,它还没有完全准备好被点击等等。
- 最好至少将元素定位器保留在 class 的顶部,而不是硬编码在代码中。
我试着让你的代码更好一点。
请尝试一下。
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'
cities = []
iteration = 0
for code in codes:
time.sleep(0.1)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath(code_input_xpath).clear()
driver.find_element_by_xpath(code_input_xpath).send_keys(code)
# Search
button = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
button.click()
# Scrape city name
time.sleep(2)
city = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
所以我制作了一个 selenium 机器人,它遍历区域代码列表并将此代码发送到网站的搜索框,该网站将代码更改为城市名称,然后我将其抓取以获取列表城市代替代码列表。问题是,当我的 for 循环遍历列表时,有时它会“跳过”给出的命令并直接进入下一次迭代,因此我没有收到完整的城市列表。列表中的某些代码不存在或不适合传递到网站中,因此我对这些情况进行了例外处理。
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
cities = []
iteration = 0
for code in codes:
time.sleep(0.05)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
# Search
try:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
except:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
# Scrape city name
city = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
这是我的控制台日志的一部分。如您所见,在指示迭代次数为 98 后,它跳到 99,在那里它完全正常工作,打印城市和地区代码。这个问题发生在循环的更深处,但每次它都从第 98 次迭代开始。与此相关的地区代码不是例外之一。
96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']
**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**
这里有几个问题:
- 你的定位器太糟糕了。
- 我看到你的结果不正确。例如,对于“2262011”输入,输出是“Gdynia (2262011)”,而您为输入“2205084”呈现此输出
- 您的 except 代码与 try 代码相似。这没有意义。如果这在 try 块中不起作用,为什么您认为这将在第二次尝试时起作用而无需任何更改?
- 还最好等待元素可见性而不是存在,因为在元素刚出现的那一刻,它还没有完全准备好被点击等等。
- 最好至少将元素定位器保留在 class 的顶部,而不是硬编码在代码中。
我试着让你的代码更好一点。
请尝试一下。
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'
cities = []
iteration = 0
for code in codes:
time.sleep(0.1)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath(code_input_xpath).clear()
driver.find_element_by_xpath(code_input_xpath).send_keys(code)
# Search
button = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
button.click()
# Scrape city name
time.sleep(2)
city = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()