Selenium 不适用于某个网站
Selenium doesnt work for a certain website
我正在尝试使用 selenium 抓取动态网页。
在这里,我尝试打印 website
中的所有作者
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://quotes.toscrape.com/js")
elements = driver.find_elements_by_class_name("author")
for i in elements:
print(i.text)
driver.quit()
效果很好,打印出正确的结果:
Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin
但是当我尝试将类似的代码用于另一个 website
我得到一个错误:
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
(Session info: chrome=98.0.4758.102)
这是我的第二个代码:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
elements = driver.find_elements_by_class_name("title text-center")
for i in elements:
print(i.text)
driver.quit()
我在这段代码中要做的是在网页中打印所有香水的名称。
检查后我发现所有的名字都在 class 中,它叫做:'title text-center'.
如何修复我的代码?
title text-center
实际上是 2 class 个名字 title
和 text-center
.
为了通过 2 class 名称定位元素,您必须使用 XPath 或 CSS 选择器。
所以,而不是
elements = driver.find_elements_by_class_name("title text-center")
您可以使用
elements = driver.find_elements_by_xpath("//h3[@class='title text-center']")
或
elements = driver.find_elements_css_selector("h3.title.text-center")
此外,您应该添加等待以仅在加载并准备好时访问 Web 元素。
这应该通过预期条件显式等待来完成,如下所示:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)
driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title.text-center")))
elements = driver.find_elements_css_selector("h3.title.text-center")
for i in elements:
print(i.text)
driver.quit()
这个错误信息...
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
...暗示 you have used is not a valid as By.CLASS_NAME 将单个类名作为参数。
打印webpage you can use you can use the following 中的所有香水名称:
使用css_selector:
driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title")
print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("h3.title")])
理想情况下你需要诱导 for visibility_of_all_elements_located() and you can use the following :
使用 CSS_SELECTOR
和 get_attribute("innerHTML")
:
driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title")
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.title")))])
控制台输出:
[' 212 וי אי פי לגבר א.ד.ט 212 vip for men e.d.t ', ' 212 ניו יורק לגבר א.ד.ט 212 nyc for men e.d.t ', ' 212 סקסי לגבר א.ד.ט 212 sexy men e.d.t ', ' אברקרומבי פירס 100 מל א.ד.ק Abercrombie & Fitch Fierce 100 ml e.d.c ', ' אברקרומבי פירס 50 מל א.ד.ק Abercrombie & Fitch Fierce 50 ml e.d.c ', ' אברקרומבי פירס גודל ענק 200 מל א.ד.ק Abercrombie & Fitch Fierce 200 ml e.d.c ', ' אברקרומבי פירסט אינסטינקט לגבר א.ד.ט Abercrombie & Fitch First Instinct e.d.t ', ' אגואיסט א.ד.ט Egoiste e.d.t ', ' אגואיסט פלטינום א.ד.ט Egoiste Platinum e.d.t ', ' או דה בלנק א.ד.ט Eau De Blanc e.d.t ', ' או דה פרש א.ד.ט Eau Fraiche e.d.t ', ' אובסיישן לגבר א.ד.ט Obsession for men e.d.t ']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
我正在尝试使用 selenium 抓取动态网页。 在这里,我尝试打印 website
中的所有作者from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://quotes.toscrape.com/js")
elements = driver.find_elements_by_class_name("author")
for i in elements:
print(i.text)
driver.quit()
效果很好,打印出正确的结果:
Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin
但是当我尝试将类似的代码用于另一个 website
我得到一个错误:
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
(Session info: chrome=98.0.4758.102)
这是我的第二个代码:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
elements = driver.find_elements_by_class_name("title text-center")
for i in elements:
print(i.text)
driver.quit()
我在这段代码中要做的是在网页中打印所有香水的名称。 检查后我发现所有的名字都在 class 中,它叫做:'title text-center'.
如何修复我的代码?
title text-center
实际上是 2 class 个名字 title
和 text-center
.
为了通过 2 class 名称定位元素,您必须使用 XPath 或 CSS 选择器。
所以,而不是
elements = driver.find_elements_by_class_name("title text-center")
您可以使用
elements = driver.find_elements_by_xpath("//h3[@class='title text-center']")
或
elements = driver.find_elements_css_selector("h3.title.text-center")
此外,您应该添加等待以仅在加载并准备好时访问 Web 元素。
这应该通过预期条件显式等待来完成,如下所示:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)
driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title.text-center")))
elements = driver.find_elements_css_selector("h3.title.text-center")
for i in elements:
print(i.text)
driver.quit()
这个错误信息...
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
...暗示
打印webpage you can use
使用css_selector:
driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title") print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("h3.title")])
理想情况下你需要诱导
使用
CSS_SELECTOR
和get_attribute("innerHTML")
:driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title") print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.title")))])
控制台输出:
[' 212 וי אי פי לגבר א.ד.ט 212 vip for men e.d.t ', ' 212 ניו יורק לגבר א.ד.ט 212 nyc for men e.d.t ', ' 212 סקסי לגבר א.ד.ט 212 sexy men e.d.t ', ' אברקרומבי פירס 100 מל א.ד.ק Abercrombie & Fitch Fierce 100 ml e.d.c ', ' אברקרומבי פירס 50 מל א.ד.ק Abercrombie & Fitch Fierce 50 ml e.d.c ', ' אברקרומבי פירס גודל ענק 200 מל א.ד.ק Abercrombie & Fitch Fierce 200 ml e.d.c ', ' אברקרומבי פירסט אינסטינקט לגבר א.ד.ט Abercrombie & Fitch First Instinct e.d.t ', ' אגואיסט א.ד.ט Egoiste e.d.t ', ' אגואיסט פלטינום א.ד.ט Egoiste Platinum e.d.t ', ' או דה בלנק א.ד.ט Eau De Blanc e.d.t ', ' או דה פרש א.ד.ט Eau Fraiche e.d.t ', ' אובסיישן לגבר א.ד.ט Obsession for men e.d.t ']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC