Selenium 不会加载完整的页面源,只能部分通过 CSS 样式然后切断
Selenium will not load full Page Source, only partially through CSS styles and then cuts off
我试过在 Stack Overflow 上查看几个答案,但都无济于事。当我打印网页的页面源代码时,我只能看到标签内某个点的源代码,给或带几个字符。超出的 HTML 元素永远不会在页面源中加载或打印出来。当我尝试加载 应该 存在的 HTML 元素时(当我在 Chrome 上查看页面源时它们就在那里),我得到一个 TimeoutException
或 NoSuchElementException
.
我在通过多重身份验证门户后解析动态加载的网站。我打印了 driver.current_url
以确保我在 MFA 之后处于正确的 URL,尝试了 sleep(100)
并尝试显式等待 EC.url_contains(...)
、EC.element_to_be_clickable(...)
和 EC.presence_of_element_located(...)
.
这是我的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://brightspace.nyu.edu/d2l/home"
driver = webdriver.Chrome() # should open a Chrome window
driver.get(url) # navigate to brightspace
# MFA Handling Code here #
# Explicitly wait until we reach the Brightspace home page (logged in)
element = WebDriverWait(driver,100).until(EC.url_contains('https://brightspace.nyu.edu/d2l/home'))
print(driver.page_source)
banner = driver.find_element_by_id('bannerTitle') # throws NoSuchElementException
这是输出的一部分:
<!-- ... previous styles and HTML in <head> ... -->
<style is="custom-style">html {
--d2l-color-woolonardo: var(--d2l-color-sylvite);
.
. lots of colors
.
--d2l-color-olivine-light-1: var(--d2l-color-olivine-plus-1);
--d2l
<!-- ^^ the page source cuts off here, in <head> -->
最后一行出现以下错误:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="bannerTitle"]"}
我建议使用 WebDriverWait
、By
和 EC
而不是 banner = driver.find_element_by_id
。我还会在找到横幅后放置 print(driver.page_source)
。我们也可以尝试向下滚动页面。我在下面注释掉了您的一些行并添加了我建议的更新。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://brightspace.nyu.edu/d2l/home"
driver = webdriver.Chrome() # should open a Chrome window
driver.get(url) # navigate to brightspace
# MFA Handling Code here #
# Explicitly wait until we reach the Brightspace home page (logged in)
element = WebDriverWait(driver,100).until(EC.url_contains('https://brightspace.nyu.edu/d2l/home'))
# print(driver.page_source)
# banner = driver.find_element_by_id('bannerTitle') # throws NoSuchElementException
##################################
######## NEW SUGGESTIONS #########
##################################
banner = WebDriverWait(self.driver, 100).until(EC.visibility_of_element_located(
(By.ID, "bannerTitle")))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
print(driver.page_source)
我试过在 Stack Overflow 上查看几个答案,但都无济于事。当我打印网页的页面源代码时,我只能看到标签内某个点的源代码,给或带几个字符。超出的 HTML 元素永远不会在页面源中加载或打印出来。当我尝试加载 应该 存在的 HTML 元素时(当我在 Chrome 上查看页面源时它们就在那里),我得到一个 TimeoutException
或 NoSuchElementException
.
我在通过多重身份验证门户后解析动态加载的网站。我打印了 driver.current_url
以确保我在 MFA 之后处于正确的 URL,尝试了 sleep(100)
并尝试显式等待 EC.url_contains(...)
、EC.element_to_be_clickable(...)
和 EC.presence_of_element_located(...)
.
这是我的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://brightspace.nyu.edu/d2l/home"
driver = webdriver.Chrome() # should open a Chrome window
driver.get(url) # navigate to brightspace
# MFA Handling Code here #
# Explicitly wait until we reach the Brightspace home page (logged in)
element = WebDriverWait(driver,100).until(EC.url_contains('https://brightspace.nyu.edu/d2l/home'))
print(driver.page_source)
banner = driver.find_element_by_id('bannerTitle') # throws NoSuchElementException
这是输出的一部分:
<!-- ... previous styles and HTML in <head> ... -->
<style is="custom-style">html {
--d2l-color-woolonardo: var(--d2l-color-sylvite);
.
. lots of colors
.
--d2l-color-olivine-light-1: var(--d2l-color-olivine-plus-1);
--d2l
<!-- ^^ the page source cuts off here, in <head> -->
最后一行出现以下错误:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="bannerTitle"]"}
我建议使用 WebDriverWait
、By
和 EC
而不是 banner = driver.find_element_by_id
。我还会在找到横幅后放置 print(driver.page_source)
。我们也可以尝试向下滚动页面。我在下面注释掉了您的一些行并添加了我建议的更新。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://brightspace.nyu.edu/d2l/home"
driver = webdriver.Chrome() # should open a Chrome window
driver.get(url) # navigate to brightspace
# MFA Handling Code here #
# Explicitly wait until we reach the Brightspace home page (logged in)
element = WebDriverWait(driver,100).until(EC.url_contains('https://brightspace.nyu.edu/d2l/home'))
# print(driver.page_source)
# banner = driver.find_element_by_id('bannerTitle') # throws NoSuchElementException
##################################
######## NEW SUGGESTIONS #########
##################################
banner = WebDriverWait(self.driver, 100).until(EC.visibility_of_element_located(
(By.ID, "bannerTitle")))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
print(driver.page_source)