并不是所有的html都可以在python-requests-html中访问

Question

我正在尝试运行一个脚本来简单地在网站中查找一些数字，但它似乎不想让我超过某个点。在这个脚本中：

from requests_html import HTMLSession
import requests

url = "https://auction.chimpers.xyz/"
try:
    s = HTMLSession()
    r = s.get(url)
except requests.exceptions.RequestException as e:
    print(e)

r.html.render(sleep=1)

title = r.html.find("title",first=True).text
print(title)

divs_found = r.html.find("div")
print(divs_found)

meta_desc = r.html.xpath('//*[@id="description-view"]/div',first=True)
print(meta_desc)

price = r.html.find(".m-complete-info div",first=True)
print(price)

结果为：

Chimpers Genesis 100  
[<Element 'div' id='app'>, <Element 'div' data-v-1d311e85='' id='m-connection' class=('manifold',)>, <Element 'div' id='description-view'>, <Element 'div' class=('manifold', 'm-complete-view')>, <Element 'div' data-v-cf8dbfe2='' class=('manifold', 'loading-screen')>, <Element 'div' class=('manifold-logo',)>]
<Element 'div' class=('manifold', 'm-complete-view')>  
None  
[Finished in 3.9s]

网站：https://auction.chimpers.xyz/

and the information I am trying to find is here

很明显，在列表中打印出来的元素之外还有更多 HTML 元素，但是每次我尝试访问它们时，即使使用 r.html.xpath("//*[@id="description-view"]/div/div[2]/div/div[2]/span/span[1]") 它也会 return None 即使它是我通过 google

中的检查获得的复制的 xpath

这是什么原因以及我将如何处理它？

Answer 1

我什至不知道是否可以使用 requests_html，但可以使用 selenium。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://auction.chimpers.xyz/"
class_names = ["m-price-label", "m-price-data"]

driver_options = Options()
driver_options.add_argument("--headless")
driver = webdriver.Chrome(options=driver_options)
driver.get(url)

results = {}

try:
    for class_name in class_names:
        element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, class_name)))
        # Getting inner text of the html tag
        results[class_name] = element.get_attribute("textContent")
finally:
    driver.quit()

print(results)

请随意使用其他网络驱动程序而不是 Chrome

并不是所有的html都可以在python-requests-html中访问

Not all html can be accessed in pythion-requests-html

html

javascript

css

python

python-requests-html