Python 使用 beautifulsoup 抓取无法正确抓取某些数据行
Python scraping with beautifulsoup cannot scrape properly some lines of data
我正在 python 探索网络抓取。我有以下代码片段,但此代码的问题是提取的某些数据行不正确。这段代码可能有什么问题?
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url = 'https://bscscan.com/txsinternal?ps=100&zero=false&valid=all'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req, timeout=10).read()
soup = BeautifulSoup(webpage, 'html.parser')
rows = soup.findAll('table')[0].findAll('tr')
for row in rows[1:]:
ttype = (row.find_all('td')[3].text[0:])
amt = (row.find_all('td')[7].text[0:])
transamt = str(amt)
print()
print ("this is bnbval: ", transamt)
print ("transactiontype: ", ttype)
示例输出:
trans amt: Binance: WBNB Token #- wrong data being extracted
transtype: 0x2de500a9a2d01c1d0a0b84341340f92ac0e2e33b9079ef04d2a5be88a4a633d4 #- wrong data being extracted
trans amt: 1 BNB
transtype: call
trans amt: 1 BNB
transtype: call
this is bnbval: Binance: WBNB Token #- wrong data being extracted
transactiontype: 0x1cc224ba17182f8a4a1309cb2aa8fe4d19de51c650c6718e4febe07a51387dce #- wrong data being extracted
trans amt: 1 BNB
transtype: call
你的代码没有问题。但是页面数据有问题
有些行是 7 列行 - 正如您所期望的那样,有些行是 9 列行。那些有 9 列的行给你错误的数据。
您可以直接转到页面并检查元素以查看问题。
我建议您使用最后一个元素 [-1]
而不是 [7]
。但是你需要有某种 if 检查第 3 列
我正在 python 探索网络抓取。我有以下代码片段,但此代码的问题是提取的某些数据行不正确。这段代码可能有什么问题?
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url = 'https://bscscan.com/txsinternal?ps=100&zero=false&valid=all'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req, timeout=10).read()
soup = BeautifulSoup(webpage, 'html.parser')
rows = soup.findAll('table')[0].findAll('tr')
for row in rows[1:]:
ttype = (row.find_all('td')[3].text[0:])
amt = (row.find_all('td')[7].text[0:])
transamt = str(amt)
print()
print ("this is bnbval: ", transamt)
print ("transactiontype: ", ttype)
示例输出:
trans amt: Binance: WBNB Token #- wrong data being extracted
transtype: 0x2de500a9a2d01c1d0a0b84341340f92ac0e2e33b9079ef04d2a5be88a4a633d4 #- wrong data being extracted
trans amt: 1 BNB
transtype: call
trans amt: 1 BNB
transtype: call
this is bnbval: Binance: WBNB Token #- wrong data being extracted
transactiontype: 0x1cc224ba17182f8a4a1309cb2aa8fe4d19de51c650c6718e4febe07a51387dce #- wrong data being extracted
trans amt: 1 BNB
transtype: call
你的代码没有问题。但是页面数据有问题
有些行是 7 列行 - 正如您所期望的那样,有些行是 9 列行。那些有 9 列的行给你错误的数据。
您可以直接转到页面并检查元素以查看问题。
我建议您使用最后一个元素 [-1]
而不是 [7]
。但是你需要有某种 if 检查第 3 列