并不是所有的html都可以在python-requests-html中访问
Not all html can be accessed in pythion-requests-html
我正在尝试 运行 一个脚本来简单地在网站中查找一些数字,但它似乎不想让我超过某个点。在这个脚本中:
from requests_html import HTMLSession
import requests
url = "https://auction.chimpers.xyz/"
try:
s = HTMLSession()
r = s.get(url)
except requests.exceptions.RequestException as e:
print(e)
r.html.render(sleep=1)
title = r.html.find("title",first=True).text
print(title)
divs_found = r.html.find("div")
print(divs_found)
meta_desc = r.html.xpath('//*[@id="description-view"]/div',first=True)
print(meta_desc)
price = r.html.find(".m-complete-info div",first=True)
print(price)
结果为:
Chimpers Genesis 100
[<Element 'div' id='app'>, <Element 'div' data-v-1d311e85='' id='m-connection' class=('manifold',)>, <Element 'div' id='description-view'>, <Element 'div' class=('manifold', 'm-complete-view')>, <Element 'div' data-v-cf8dbfe2='' class=('manifold', 'loading-screen')>, <Element 'div' class=('manifold-logo',)>]
<Element 'div' class=('manifold', 'm-complete-view')>
None
[Finished in 3.9s]
网站:https://auction.chimpers.xyz/
and the information I am trying to find is here
很明显,在列表中打印出来的元素之外还有更多 HTML 元素,但是每次我尝试访问它们时,即使使用 r.html.xpath("//*[@id="description-view"]/div/div[2]/div/div[2]/span/span[1]")
它也会 return None 即使它是我通过 google
中的检查获得的复制的 xpath
这是什么原因以及我将如何处理它?
我什至不知道是否可以使用 requests_html
,但可以使用 selenium
。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
url = "https://auction.chimpers.xyz/"
class_names = ["m-price-label", "m-price-data"]
driver_options = Options()
driver_options.add_argument("--headless")
driver = webdriver.Chrome(options=driver_options)
driver.get(url)
results = {}
try:
for class_name in class_names:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, class_name)))
# Getting inner text of the html tag
results[class_name] = element.get_attribute("textContent")
finally:
driver.quit()
print(results)
请随意使用其他网络驱动程序而不是 Chrome
我正在尝试 运行 一个脚本来简单地在网站中查找一些数字,但它似乎不想让我超过某个点。在这个脚本中:
from requests_html import HTMLSession
import requests
url = "https://auction.chimpers.xyz/"
try:
s = HTMLSession()
r = s.get(url)
except requests.exceptions.RequestException as e:
print(e)
r.html.render(sleep=1)
title = r.html.find("title",first=True).text
print(title)
divs_found = r.html.find("div")
print(divs_found)
meta_desc = r.html.xpath('//*[@id="description-view"]/div',first=True)
print(meta_desc)
price = r.html.find(".m-complete-info div",first=True)
print(price)
结果为:
Chimpers Genesis 100
[<Element 'div' id='app'>, <Element 'div' data-v-1d311e85='' id='m-connection' class=('manifold',)>, <Element 'div' id='description-view'>, <Element 'div' class=('manifold', 'm-complete-view')>, <Element 'div' data-v-cf8dbfe2='' class=('manifold', 'loading-screen')>, <Element 'div' class=('manifold-logo',)>]
<Element 'div' class=('manifold', 'm-complete-view')>
None
[Finished in 3.9s]
网站:https://auction.chimpers.xyz/
and the information I am trying to find is here
很明显,在列表中打印出来的元素之外还有更多 HTML 元素,但是每次我尝试访问它们时,即使使用 r.html.xpath("//*[@id="description-view"]/div/div[2]/div/div[2]/span/span[1]")
它也会 return None 即使它是我通过 google
这是什么原因以及我将如何处理它?
我什至不知道是否可以使用 requests_html
,但可以使用 selenium
。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
url = "https://auction.chimpers.xyz/"
class_names = ["m-price-label", "m-price-data"]
driver_options = Options()
driver_options.add_argument("--headless")
driver = webdriver.Chrome(options=driver_options)
driver.get(url)
results = {}
try:
for class_name in class_names:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, class_name)))
# Getting inner text of the html tag
results[class_name] = element.get_attribute("textContent")
finally:
driver.quit()
print(results)
请随意使用其他网络驱动程序而不是 Chrome