使用 selenium 从 glassdoor 中抓取数据

web scraping data from glassdoor using selenium

我需要一些帮助 运行 此代码 (https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_scraper.py) 为了从 Glassdoor
抓取工作机会数据 这是代码片段:

from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium import webdriver
import time
import pandas as pd

 options = webdriver.ChromeOptions()
    
#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')
    
#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome(executable_path=path, options=options)
driver.set_window_size(1120, 1000)
    
url = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword="+'data scientist'+"&sc.keyword="+'data scientist'+"&locT=&locId=&jobType="
#url = 'https://www.glassdoor.com/Job/jobs.htm?sc.keyword="' + keyword + '"&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=-1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=-1&employerSizes=0&applicationType=0&remoteWorkType=0'
driver.get(url)

#Let the page load. Change this number based on your internet speed.
        #Or, wait until the webpage is loaded, instead of hardcoding it.
time.sleep(5)

        #Test for the "Sign Up" prompt and get rid of it.
try:
    driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
    pass
time.sleep(.1)
try:
    driver.find_element_by_css_selector('[alt="Close"]').click() #clicking to the X.
    print(' x out worked')
except NoSuchElementException:
    print(' x out failed')
    pass

        
#Going through each job in this page
job_buttons = driver.find_elements_by_class_name("jl")

我得到一个空列表

job_buttons
[]

你的问题是 except 参数错误。
使用 driver.find_element_by_class_name("selected").click() 你试图点击不存在的元素。该页面上没有匹配“selected”class 名称的元素。这会导致 NoSuchElementException 异常,正如您在尝试捕获 ElementClickInterceptedException 异常时看到的那样。
要解决此问题,您应该使用正确的定位器或至少在 except.
中使用正确的参数 像这样:

try:
    driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
    pass

甚至

try:
    driver.find_element_by_class_name("selected").click()
except:
    pass

我不确定你想加入什么元素job_buttons
包含每个工作的所有详细信息的搜索结果可以通过以下方式找到:

job_buttons = driver.find_elements_by_css_selector("li.react-job-listing")