尝试获取一些页面数据时获取 AttributeErrors

Getting AttributeErrors while trying to grab some page data

美好的一天。我试图从 url 中获取一些数据,但是,由于遇到错误,我的脚本只有几行有效。任何想法都可以。谢谢

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import requests

header = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"}

url = "https://www.pinksale.finance/#/pinklock/detail/0x5f7faccaff14ce5fe5ae2ff5bb1ea2fa1b7fc526?chain=BSC"
print ("Link:", url)

urlpage = requests.get(url, headers=header, timeout=10, allow_redirects=False)
site = BeautifulSoup(urlpage.content, 'html.parser')

item1 = site.find('div', class_='ant-list-item').get_text()
item2 = site.find('div', class_='Total Amount Locked').get_text()
item3 = site.find('div', class_='Total Values Locked').get_text()

print ("item1: ", item1)
print ("item2: ", item2)
print ("item3: ", item3)

当前输出:

Link: https://www.pinksale.finance/#/pinklock/detail/0x5f7faccaff14ce5fe5ae2ff5bb1ea2fa1b7fc526?chain=BSC
    AttributeError: 'NoneType' object has no attribute 'get_text'

想要的输出:

0x7A9b...c0b2    7.104591955602949963  2022.04.04 16:32 UTC
Total Amount Locked 7.10459195560294996
Total Values Locked ,843

url完全取决于JavaScript。所以你需要像 selenium 这样的自动化。现在它按预期工作。您可以 运行 代码。

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url ='https://www.pinksale.finance/#/pinklock/detail/0x5f7faccaff14ce5fe5ae2ff5bb1ea2fa1b7fc526?chain=BSC'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(8)

site = BeautifulSoup(driver.page_source, 'lxml')
driver.close()

item1 = ','.join([x.get_text().strip() for x in site.select('.ant-spin-container div ul li div.LockRecord_tvl__1cBpD')])
item2 = [x.get_text() for x in site.select('td.has-text-right')]

print ("item1: ", item1)
print ("item2: ", item2[0])
print ("item3: ", item2[1])

输出:

item1: 0x7A9b...c0b2,7.104591955602949963,2022.04.04 16:32 UTC      
item2:  7.104591955602949963
item3:  ,711