为什么 selenium 和 firefox webdriver 无法抓取 ajax 加载的网站标签
Why selenium and firefox webdriver cannot crawl wesite tags loaded by ajax
我想从 bonbast 获取一些 HTML 标签的文本,其中一些元素由 ajax 加载(例如带有“ounce_top”id 的标签)。我已经尝试过 selenium 和 geckodriver 但我还是无法抓取这些标签,而且当 robotic firefox (geckodriver) 打开时,这些元素也没有显示在网页上!我不知道为什么会这样。我如何抓取该网站?
代码试验:
from selenium import webdriver
from bs4 import BeautifulSoup
url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
a = driver.find_element_by_id(id_="ounce_top")
所需的元素是动态元素,因此理想情况下要提取所需的文本,即 1,817.43 您需要引入 for the and you can use either of the following :
使用CSS_SELECTOR:
driver.get("https://bonbast.com/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#ounce_top"))).text)
使用 XPATH:
driver.get("https://bonbast.com/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click()
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@id='ounce_top']"))).text)
控制台输出:
1,817.43
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
要使用 Selenium 做到这一点,您需要添加等待/延迟。最好使用预期条件显式等待。
我猜您是想获取该元素内的文本值?
这应该有效:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 20)
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
your_gold_value = wait.until(EC.visibility_of_element_located((By.ID, "ounce_top"))).text
我想从 bonbast 获取一些 HTML 标签的文本,其中一些元素由 ajax 加载(例如带有“ounce_top”id 的标签)。我已经尝试过 selenium 和 geckodriver 但我还是无法抓取这些标签,而且当 robotic firefox (geckodriver) 打开时,这些元素也没有显示在网页上!我不知道为什么会这样。我如何抓取该网站?
代码试验:
from selenium import webdriver
from bs4 import BeautifulSoup
url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
a = driver.find_element_by_id(id_="ounce_top")
所需的元素是动态元素,因此理想情况下要提取所需的文本,即 1,817.43 您需要引入
使用CSS_SELECTOR:
driver.get("https://bonbast.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span#ounce_top"))).text)
使用 XPATH:
driver.get("https://bonbast.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.btn.btn-primary.btn-sm.acceptcookies"))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@id='ounce_top']"))).text)
控制台输出:
1,817.43
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
要使用 Selenium 做到这一点,您需要添加等待/延迟。最好使用预期条件显式等待。
我猜您是想获取该元素内的文本值?
这应该有效:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url_news = 'https://bonbast.com/'
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 20)
driver.get(url_news)
html = driver.page_source
soup = BeautifulSoup(html)
your_gold_value = wait.until(EC.visibility_of_element_located((By.ID, "ounce_top"))).text