如何解析动态 dom 元素?

How to parse a dynamic dom element?

想做一个抓取价格的解析器,但是找不到解析innerHTML的工作方法

我不知道为什么,但是 selenium (getAttribute(innerHTML)), phantomjs (page.evaluation function(){return document.ElementToParse.innerHTML}) 和 scrapy-splash (loaded使用 WebPageEngine 和解析 html) 的网页不起作用。始终,结果为空“[]”、null 或 webelement

我在 banggood 的产品和着陆页上测试了我的代码,但结果总是一样。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("https://www.banggood.com/BlitzWolf-Ampcore-Turbo-TC10-3A-Durable-USB-Type-C-Charging-Data-Cable-p-1188424.html?rmmds=category&cur_warehouse=CN") #random url
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "item_now_price"))
    )
finally:
    driver.quit()
print(element)

并输出:

<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="b0593791-138b-4177-a8f3-e7983143824a", element="d08f4717-d3f1-4594-8f2b-1bf943deb9f9")>

当需要类似的东西时:

6.59(or US.59)

我也试过

price = driver.find_element_by_class_name('item_now_price').getAttribute("innerHTML")

var page = require('webpage').create();

page.open('https://www.banggood.com/BlitzWolf-Ampcore-Turbo-TC10-3A-        Durable-USB-Type-C-Charging-Data-Cable-p-1188424.html?rmmds=category&cur_warehouse=CN', function(status) {

    var price = page.evaluate(function() {
        return document.getElementByClassName('item_now_price').innerHTML;
        });
console.log('price is ' + price);
phantom.exit();
});

但结果为空,当我添加

page.includeJs(/url/to/js)

终端停止工作

s

一旦你得到了selenium中的元素,你就可以用.text

得到那个元素的文本

请参阅下面对您的第一个示例的细微调整:

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "item_now_price"))
    )
    print(element.text)
finally:

看看是否能得到您想要的结果。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.banggood.com/BlitzWolf-Ampcore-Turbo-TC10-3A-Durable-USB-Type-C-Charging-Data-Cable-p-1188424.html?rmmds=category&cur_warehouse=CN") #random url
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "item_now_price"))
    ).text
finally:
    driver.quit()
print(element)