Udemy 网站在尝试使用 Selenium 和 Python 进行 Web Scrape 时一直在加载
Udemy website keeps on loading while trying to Web Scrape with Selenium and Python
我开始学习 web scraping。作为练习,我试图获取一个列表,其中包含此查询中出现的所有课程名称:“https://www.udemy.com/courses/search/?src=ukw&q=api+python" 问题是当我启动脚本时,网络没有加载,最终 windows 被关闭。我认为也许 Udemy 具有某种类型的自动化安全性
这是我的代码:
from selenium import webdriver
import time
website = "https://www.udemy.com/courses/search/?src=ukw&q=api+python"
path = "/"
chrome_options = webdriver.ChromeOptions();
chrome_options.add_experimental_option("excludeSwitches", ['enable-logging'])
driver = webdriver.Chrome(options=chrome_options);
driver.get(website)
time.sleep(5)
matches = driver.find_elements_by_tag_name("h3")
udemy website not loading completely may be due to the fact that Selenium driven as a bot 和进一步导航被阻止的原因。
解决方案
更容易逃避检测的方法是添加以下参数:
--disable-blink-features=AutomationControlled
所以你的代码块实际上是:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\BrowserDrivers\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.udemy.com/courses/search/?src=ukw&q=api+python')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'results for')]")))
driver.save_screenshot("udemy.png")
保存的屏幕截图:
我开始学习 web scraping。作为练习,我试图获取一个列表,其中包含此查询中出现的所有课程名称:“https://www.udemy.com/courses/search/?src=ukw&q=api+python" 问题是当我启动脚本时,网络没有加载,最终 windows 被关闭。我认为也许 Udemy 具有某种类型的自动化安全性
这是我的代码:
from selenium import webdriver
import time
website = "https://www.udemy.com/courses/search/?src=ukw&q=api+python"
path = "/"
chrome_options = webdriver.ChromeOptions();
chrome_options.add_experimental_option("excludeSwitches", ['enable-logging'])
driver = webdriver.Chrome(options=chrome_options);
driver.get(website)
time.sleep(5)
matches = driver.find_elements_by_tag_name("h3")
udemy website not loading completely may be due to the fact that Selenium driven
解决方案
更容易逃避检测的方法是添加以下参数:
--disable-blink-features=AutomationControlled
所以你的代码块实际上是:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\BrowserDrivers\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.udemy.com/courses/search/?src=ukw&q=api+python')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'results for')]")))
driver.save_screenshot("udemy.png")
保存的屏幕截图: