如何在 Python 中使用 Selenium 自动化动态加载的网页
How to automate dynamically loaded webpage with Selenium in Python
我一直在尝试使用 Selenium 在动态加载的页面上自动执行一些任务,它无法通过 xpath、它们的值、标签或其他任何东西定位动态生成的元素,就像请求页面只获取源代码一样,无需解析脚本并生成其他内容。作为 Web 浏览器扩展,它工作得很好,但当导出到 Python 模块时,它就不行了。有没有办法让它在脚本中按预期工作,就像在浏览器扩展中一样?
比如说,我们有这样一个网页:
<html>
(...)
<body>
<app-root></app-root>
<script src="REDACTED.js" defer></script><script src="REDACTED.js" nomodule defer></script><script src="REDACTED.js" defer></script><script src="REDACTED.js" defer></script><script src="REDACTED.js" defer></script></body>
</html>
并且浏览器可以很好地解析这些脚本,显示按钮,UI 等。同样,使用 Selenium IDE 浏览器扩展与其交互就可以完成它的工作。当我将它导出到 Python 脚本时,它看起来像:
# Generated by Selenium IDE
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
class TestTest1():
def setup_method(self, method):
self.driver = webdriver.Chrome()
self.vars = {}
def teardown_method(self, method):
self.driver.quit()
def test_test1(self):
self.driver.get("https://REDACTED")
# Logging in to the portal I am interacting with - works fine
logform = self.driver.find_element_by_id("loginForm")
logform.find_element_by_name("username").send_keys('REDACTED')
logform.find_element_by_name("password").send_keys('REDACTED')
logform.find_element_by_name("login").click()
# Below code does not do anything, as it's not able to find these elements
# on the webpage and interact with them
self.driver.find_element(By.CSS_SELECTOR, ".btn--cta").click()
self.driver.find_element(By.CSS_SELECTOR, ".create-flow-btn--major > img").click()
element = self.driver.find_element(By.CSS_SELECTOR, ".create-flow-btn--major > img")
actions = ActionChains(self.driver)
actions.move_to_element(element).perform()
element = self.driver.find_element(By.CSS_SELECTOR, "body")
actions = ActionChains(self.driver)
actions.move_to_element(element, 0, 0).perform()
self.driver.close()
p1 = TestTest1()
p1.setup_method(1)
p1.test_test1()
p1.teardown_method(1)
在 运行 之后,脚本停止尝试查找请求的元素,并抛出以下异常:
(...) line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".btn--cta"}
哦,吴先生,我该怎么办?
正如我在评论中提到的,您需要使用显式等待来等待元素在页面上可见。
使用 WebDriverWait()
并等待 visibility_of_element_located()
和您的定位器。
WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "something")))
您需要导入以下库。
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
我一直在尝试使用 Selenium 在动态加载的页面上自动执行一些任务,它无法通过 xpath、它们的值、标签或其他任何东西定位动态生成的元素,就像请求页面只获取源代码一样,无需解析脚本并生成其他内容。作为 Web 浏览器扩展,它工作得很好,但当导出到 Python 模块时,它就不行了。有没有办法让它在脚本中按预期工作,就像在浏览器扩展中一样?
比如说,我们有这样一个网页:
<html>
(...)
<body>
<app-root></app-root>
<script src="REDACTED.js" defer></script><script src="REDACTED.js" nomodule defer></script><script src="REDACTED.js" defer></script><script src="REDACTED.js" defer></script><script src="REDACTED.js" defer></script></body>
</html>
并且浏览器可以很好地解析这些脚本,显示按钮,UI 等。同样,使用 Selenium IDE 浏览器扩展与其交互就可以完成它的工作。当我将它导出到 Python 脚本时,它看起来像:
# Generated by Selenium IDE
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
class TestTest1():
def setup_method(self, method):
self.driver = webdriver.Chrome()
self.vars = {}
def teardown_method(self, method):
self.driver.quit()
def test_test1(self):
self.driver.get("https://REDACTED")
# Logging in to the portal I am interacting with - works fine
logform = self.driver.find_element_by_id("loginForm")
logform.find_element_by_name("username").send_keys('REDACTED')
logform.find_element_by_name("password").send_keys('REDACTED')
logform.find_element_by_name("login").click()
# Below code does not do anything, as it's not able to find these elements
# on the webpage and interact with them
self.driver.find_element(By.CSS_SELECTOR, ".btn--cta").click()
self.driver.find_element(By.CSS_SELECTOR, ".create-flow-btn--major > img").click()
element = self.driver.find_element(By.CSS_SELECTOR, ".create-flow-btn--major > img")
actions = ActionChains(self.driver)
actions.move_to_element(element).perform()
element = self.driver.find_element(By.CSS_SELECTOR, "body")
actions = ActionChains(self.driver)
actions.move_to_element(element, 0, 0).perform()
self.driver.close()
p1 = TestTest1()
p1.setup_method(1)
p1.test_test1()
p1.teardown_method(1)
在 运行 之后,脚本停止尝试查找请求的元素,并抛出以下异常:
(...) line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".btn--cta"}
哦,吴先生,我该怎么办?
正如我在评论中提到的,您需要使用显式等待来等待元素在页面上可见。
使用 WebDriverWait()
并等待 visibility_of_element_located()
和您的定位器。
WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "something")))
您需要导入以下库。
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By