Python for循环跳过迭代

Python for loop skips iteration

所以我制作了一个 selenium 机器人,它遍历区域代码列表并将此代码发送到网站的搜索框,该网站将代码更改为城市名称,然后我将其抓取以获取列表城市代替代码列表。问题是,当我的 for 循环遍历列表时,有时它会“跳过”给出的命令并直接进入下一次迭代,因此我没有收到完整的城市列表。列表中的某些代码不存在或不适合传递到网站中,因此我对这些情况进行了例外处理。

import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")

# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()


cities = []


iteration = 0

for code in codes:
    time.sleep(0.05)
    iteration += 1
    print(iteration)
    if code == "Absence":
        cities.append("Absence")
    elif code == "Error":
        cities.append("Error")
    elif code == 2211041 or code == 2211021:
        cities.append("Manual")
    else:
        # Send territorial code
        driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
        driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
        # Search
        try:
            button = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH,
                                                '/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
            button.click()
        except:
            button = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH,
                                                '/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
            button.click()
        # Scrape city name
        city = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
        print(code)
        print(city)
        cities.append(city)


table = {
    "Cities": cities
}

df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()

这是我的控制台日志的一部分。如您所见,在指示迭代次数为 98 后,它跳到 99,在那里它完全正常工作,打印城市和地区代码。这个问题发生在循环的更深处,但每次它都从第 98 次迭代开始。与此相关的地区代码不是例外之一。

96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']

**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**

这里有几个问题:

  1. 你的定位器太糟糕了。
  2. 我看到你的结果不正确。例如,对于“2262011”输入,输出是“Gdynia (2262011)”,而您为输入“2205084”呈现此输出
  3. 您的 except 代码与 try 代码相似。这没有意义。如果这在 try 块中不起作用,为什么您认为这将在第二次尝试时起作用而无需任何更改?
  4. 还最好等待元素可见性而不是存在,因为在元素刚出现的那一刻,它还没有完全准备好被点击等等。
  5. 最好至少将元素定位器保留在 class 的顶部,而不是硬编码在代码中。

我试着让你的代码更好一点。
请尝试一下。

import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")

# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()

code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'



cities = []


iteration = 0

for code in codes:
    time.sleep(0.1)
    iteration += 1
    print(iteration)
    if code == "Absence":
        cities.append("Absence")
    elif code == "Error":
        cities.append("Error")
    elif code == 2211041 or code == 2211021:
        cities.append("Manual")
    else:
        # Send territorial code
        driver.find_element_by_xpath(code_input_xpath).clear()
        driver.find_element_by_xpath(code_input_xpath).send_keys(code)
        # Search
        button = WebDriverWait(driver, 20).until(
                EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
            button.click()        
        # Scrape city name
        time.sleep(2)
        city = WebDriverWait(driver, 20).until(
            EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
        print(code)
        print(city)
        cities.append(city)


table = {
    "Cities": cities
}

df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()