Udemy 网站在尝试使用 Selenium 和 Python 进行 Web Scrape 时一直在加载

Question

我开始学习 web scraping。作为练习，我试图获取一个列表，其中包含此查询中出现的所有课程名称：“https://www.udemy.com/courses/search/?src=ukw&q=api+python" 问题是当我启动脚本时，网络没有加载，最终 windows 被关闭。我认为也许 Udemy 具有某种类型的自动化安全性

这是我的代码：

from selenium import webdriver
import time 

website = "https://www.udemy.com/courses/search/?src=ukw&q=api+python"
path = "/"

chrome_options = webdriver.ChromeOptions(); 
chrome_options.add_experimental_option("excludeSwitches", ['enable-logging'])
driver = webdriver.Chrome(options=chrome_options);  
driver.get(website)
time.sleep(5) 
matches = driver.find_elements_by_tag_name("h3")

Answer 1

udemy website not loading completely may be due to the fact that Selenium driven as a bot 和进一步导航被阻止的原因。

解决方案

更容易逃避检测的方法是添加以下参数：

--disable-blink-features=AutomationControlled

所以你的代码块实际上是：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\BrowserDrivers\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get('https://www.udemy.com/courses/search/?src=ukw&q=api+python')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[contains(., 'results for')]")))
driver.save_screenshot("udemy.png")

保存的屏幕截图：

Udemy 网站在尝试使用 Selenium 和 Python 进行 Web Scrape 时一直在加载

Udemy website keeps on loading while trying to Web Scrape with Selenium and Python

python

selenium

google-chrome

bots

selenium-chromedriver

解决方案