Selenium 网络驱动程序刮板

Selenium web driver scraper

我 post 由于无法 post 正确地回答了同样的问题,我再次 post 提出我的问题,我已经为网站创建了一个使用 selenium 的抓取工具 https://maharerait.mahaonline.gov.in/searchlist/searchlist

每当我 运行 它会遍历每个下拉列表,当它发现要在 csv 中抓取的有用数据时,它会给出一定的错误:

Traceback (most recent call last):
 File "C:\Users\prince.bhatia\Desktop\maharera\Maha_Rera.py", line 66, in 
<module>
    selectVillage.select_by_index(villageElement)
  File 
"C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-
packages\selenium\webdriver\support\select.py", line 103, in select_by_index
raise NoSuchElementException("Could not locate element with index %d" % 
index)
selenium.common.exceptions.NoSuchElementException: Message: Could not locate 
element with index 33

下面是我的代码:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


import os
import time
import csv
import sys

driver = webdriver.Chrome("./chromedriver")

driver.get('https://maharerait.mahaonline.gov.in/searchlist/searchlist')

# try:
#     element = WebDriverWait(driver, 100).until(
#         EC.presence_of_element_located((By.ID, "Promoter"))
#     )
# finally:
#     print('0000000000000000000000')
#     driver.quit()

time.sleep(1)

driver.find_element_by_id('Promoter').click()

divisionLength = len(Select(driver.find_element_by_id('Division')).options)
print('*********{}'.format(divisionLength))

firstRow = 0
titleRow = []
contentRows = []

    for divisionElement in range(1,divisionLength):
    selectDivision = Select(driver.find_element_by_id('Division'))
    selectDivision.options
selectDivision.select_by_index(divisionElement)
time.sleep(1)
districtLength = len(Select(driver.find_element_by_id('District')).options)
while districtLength == 1:
    pass
print(districtLength)
for districtElement in range(1,districtLength):
    selectDistrict = Select(driver.find_element_by_id('District'))
    selectDistrict.options

    selectDistrict.select_by_index(districtElement)

    time.sleep(1)

    talukaLength = len(Select(driver.find_element_by_id('Taluka')).options)
    print('/-----taluka numbers: {}-------/'.format(talukaLength))
    for talukaElement in range(1, talukaLength):
        selectTaluka = Select(driver.find_element_by_id('Taluka'))
        selectTaluka.options
        selectTaluka.select_by_index(talukaElement)
        time.sleep(1)

        villageLength = len(Select(driver.find_element_by_id('Village')).options)
        print('/-----village numbers: {}-------/'.format(talukaLength))
        for villageElement in range(1, villageLength):
            selectVillage = Select(driver.find_element_by_id('Village'))
            selectVillage.options
            selectVillage.select_by_index(villageElement)
            time.sleep(2)
            projectLength = len(Select(driver.find_element_by_id('Project')).options)

            print('/------------------------------/')
            print('/-----project number: {}-------/'.format(projectLength))
            print('/------------------------------/')
            if projectLength == 1:
                continue

            for projectElement in range(1,projectLength):
                selectProject = Select(driver.find_element_by_id('Project'))
                selectProject.options

                while len(selectProject.options) == 1:
                    pass
                # c = len(select.options)
                # print('---------------{}'.format(c))

                # titleRow = []
                # contentRows = []
                # firstRow = 0

                # for i in range(1,c):
                #     select = Select(driver.find_element_by_id('Project'))
                #     while len(select.options) == 1:
                #         pass
                time.sleep(1)
                selectProject.select_by_index(projectElement)

                driver.find_element_by_id('btnSearch').click()
                tableRows = driver.find_element_by_class_name('table').find_elements_by_tag_name('tr')

                if firstRow == 0:
                    headRow = tableRows[0].find_elements_by_tag_name('th')
                    for headRowData in range(0,len(headRow)):
                        text = headRow[headRowData].find_element_by_tag_name('span').text
                        titleRow.append(text)
                    firstRow = firstRow + 1

                for dataRowsNumbers in range(1,len(tableRows)):
                    dataRow = tableRows[dataRowsNumbers].find_elements_by_tag_name('td')
                    tempList = []
                    for dataRowContents in range(0,len(dataRow)):
                        try:
                            a_link = dataRow[dataRowContents].find_element_by_tag_name('a').get_attribute('href')
                            tempList.append(str(a_link))
                        except:
                            tempList.append(str(dataRow[dataRowContents].text))
                        # if dataRow[dataRowContents].text == 'View':
                        #     a_link = dataRow[dataRowContents].find_element_by_tag_name('a').get_attribute('href')
                        #     tempList.append(str(a_link))
                        # else:
                        #     tempList.append(str(dataRow[dataRowContents].text))
                        print(dataRow[dataRowContents].text)
                    print(tempList)
                    contentRows.append(tempList)
# print('Automated check is over')
# print('Stored data in programs is as below:')
# print(contentRows)
if sys.version_info[0] <= 2:
    with open("./data.csv",'w') as csvfile:
    csvfile = csv.writer(csvfile, delimiter=',')
    csvfile.writerow(titleRow)
    csvfile.writerow("")
    for i in range(0,len(contentRows)):
        csvfile.writerow(contentRows[i])
else:
    with open("./data.csv",'w',newline='') as csvfile:
        csvfile = csv.writer(csvfile, delimiter=',')
        csvfile.writerow(titleRow)
        csvfile.writerow("")
        for i in range(0,len(contentRows)):
        csvfile.writerow(contentRows[i])
driver.close()

如果有人能告诉我,我哪里做错了?我正在使用 python 3.6 我已经关闭了上一个问题 我不得不在这里给 4 个空格,但除了错误之外,原件写得很好。

您的错误在线弹出:

selectVillage.select_by_index(villageElement)

根据documentation:

This is done by examining the "index" attribute of an element, and not merely by counting.

因此,您需要更改代码以遍历元素本身,而不是 range(1,projectLength)

更新:

根据 this,解决方案可能很简单,就是从 0 而不是 1 开始范围:

for villageElement in range(0, villageLength - 1):

注意:如果这可行,那么您需要相应地更改其他循环。