Selenium 不适用于某个网站

Selenium doesnt work for a certain website

我正在尝试使用 selenium 抓取动态网页。 在这里,我尝试打印 website

中的所有作者
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://quotes.toscrape.com/js")
elements = driver.find_elements_by_class_name("author")
for i in elements:
    print(i.text)
driver.quit()

效果很好,打印出正确的结果:

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin

但是当我尝试将类似的代码用于另一个 website

我得到一个错误:

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
  (Session info: chrome=98.0.4758.102)

这是我的第二个代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
elements = driver.find_elements_by_class_name("title  text-center")
for i in elements:
    print(i.text)
driver.quit()

我在这段代码中要做的是在网页中打印所有香水的名称。 检查后我发现所有的名字都在 class 中,它叫做:'title text-center'.

如何修复我的代码?

title text-center 实际上是 2 class 个名字 titletext-center.
为了通过 2 class 名称定位元素,您必须使用 XPath 或 CSS 选择器。
所以,而不是

elements = driver.find_elements_by_class_name("title  text-center")

您可以使用

elements = driver.find_elements_by_xpath("//h3[@class='title  text-center']")

elements = driver.find_elements_css_selector("h3.title.text-center")

此外,您应该添加等待以仅在加载并准备好时访问 Web 元素。
这应该通过预期条件显式等待来完成,如下所示:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)

driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title.text-center")))
elements = driver.find_elements_css_selector("h3.title.text-center")
for i in elements:
    print(i.text)
driver.quit()

这个错误信息...

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator

...暗示 you have used is not a valid as By.CLASS_NAME 将单个类名作为参数。


打印webpage you can use you can use the following 中的所有香水名称:

  • 使用css_selector:

    driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title")
    print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("h3.title")])
    

理想情况下你需要诱导 for visibility_of_all_elements_located() and you can use the following :

  • 使用 CSS_SELECTORget_attribute("innerHTML"):

    driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title")
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.title")))])
    
  • 控制台输出:

    [' 212 וי אי פי לגבר א.ד.ט 212 vip for men e.d.t ', ' 212 ניו יורק לגבר א.ד.ט 212 nyc for men e.d.t ', ' 212 סקסי לגבר א.ד.ט 212 sexy men e.d.t ', ' אברקרומבי פירס 100 מל א.ד.ק Abercrombie & Fitch Fierce 100 ml e.d.c ', ' אברקרומבי פירס 50 מל א.ד.ק Abercrombie & Fitch Fierce 50 ml e.d.c ', ' אברקרומבי פירס גודל ענק 200 מל א.ד.ק Abercrombie & Fitch Fierce 200 ml e.d.c ', ' אברקרומבי פירסט אינסטינקט לגבר א.ד.ט  Abercrombie & Fitch First Instinct e.d.t ', ' אגואיסט א.ד.ט Egoiste e.d.t ', ' אגואיסט פלטינום א.ד.ט Egoiste Platinum e.d.t ', ' או דה בלנק א.ד.ט Eau De Blanc e.d.t ', ' או דה פרש א.ד.ט Eau Fraiche e.d.t ', ' אובסיישן לגבר א.ד.ט Obsession for men e.d.t ']
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC