使用 selenium 从 glassdoor 中抓取数据
web scraping data from glassdoor using selenium
我需要一些帮助 运行 此代码 (https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_scraper.py)
为了从 Glassdoor
抓取工作机会数据
这是代码片段:
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium import webdriver
import time
import pandas as pd
options = webdriver.ChromeOptions()
#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')
#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome(executable_path=path, options=options)
driver.set_window_size(1120, 1000)
url = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword="+'data scientist'+"&sc.keyword="+'data scientist'+"&locT=&locId=&jobType="
#url = 'https://www.glassdoor.com/Job/jobs.htm?sc.keyword="' + keyword + '"&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=-1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=-1&employerSizes=0&applicationType=0&remoteWorkType=0'
driver.get(url)
#Let the page load. Change this number based on your internet speed.
#Or, wait until the webpage is loaded, instead of hardcoding it.
time.sleep(5)
#Test for the "Sign Up" prompt and get rid of it.
try:
driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
pass
time.sleep(.1)
try:
driver.find_element_by_css_selector('[alt="Close"]').click() #clicking to the X.
print(' x out worked')
except NoSuchElementException:
print(' x out failed')
pass
#Going through each job in this page
job_buttons = driver.find_elements_by_class_name("jl")
我得到一个空列表
job_buttons
[]
你的问题是 except
参数错误。
使用 driver.find_element_by_class_name("selected").click()
你试图点击不存在的元素。该页面上没有匹配“selected”class 名称的元素。这会导致 NoSuchElementException
异常,正如您在尝试捕获 ElementClickInterceptedException
异常时看到的那样。
要解决此问题,您应该使用正确的定位器或至少在 except
.
中使用正确的参数
像这样:
try:
driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
pass
甚至
try:
driver.find_element_by_class_name("selected").click()
except:
pass
我不确定你想加入什么元素job_buttons
。
包含每个工作的所有详细信息的搜索结果可以通过以下方式找到:
job_buttons = driver.find_elements_by_css_selector("li.react-job-listing")
我需要一些帮助 运行 此代码 (https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_scraper.py)
为了从 Glassdoor
抓取工作机会数据
这是代码片段:
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium import webdriver
import time
import pandas as pd
options = webdriver.ChromeOptions()
#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')
#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome(executable_path=path, options=options)
driver.set_window_size(1120, 1000)
url = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword="+'data scientist'+"&sc.keyword="+'data scientist'+"&locT=&locId=&jobType="
#url = 'https://www.glassdoor.com/Job/jobs.htm?sc.keyword="' + keyword + '"&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=-1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=-1&employerSizes=0&applicationType=0&remoteWorkType=0'
driver.get(url)
#Let the page load. Change this number based on your internet speed.
#Or, wait until the webpage is loaded, instead of hardcoding it.
time.sleep(5)
#Test for the "Sign Up" prompt and get rid of it.
try:
driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
pass
time.sleep(.1)
try:
driver.find_element_by_css_selector('[alt="Close"]').click() #clicking to the X.
print(' x out worked')
except NoSuchElementException:
print(' x out failed')
pass
#Going through each job in this page
job_buttons = driver.find_elements_by_class_name("jl")
我得到一个空列表
job_buttons
[]
你的问题是 except
参数错误。
使用 driver.find_element_by_class_name("selected").click()
你试图点击不存在的元素。该页面上没有匹配“selected”class 名称的元素。这会导致 NoSuchElementException
异常,正如您在尝试捕获 ElementClickInterceptedException
异常时看到的那样。
要解决此问题,您应该使用正确的定位器或至少在 except
.
中使用正确的参数
像这样:
try:
driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
pass
甚至
try:
driver.find_element_by_class_name("selected").click()
except:
pass
我不确定你想加入什么元素job_buttons
。
包含每个工作的所有详细信息的搜索结果可以通过以下方式找到:
job_buttons = driver.find_elements_by_css_selector("li.react-job-listing")