使用 python 进行网络抓取 - 未下载动态 table 数据

Web scraping with python - Dynamic table data isn't downloaded

我想从网站获取交易时间https://explorer.flitsnode.app/address/FieXP1irJKvmWUiqV18AFdDZD8bgWvfRiC/ 但是当我请求 html 我没有得到完整的网站数据 .

除了我需要的 table 的内容之外,我得到了所有东西 - “交易地址”

我有 css 选择器 table #txaddr 但它 returns 只是顶部(时间戳,块,哈希,..)

到目前为止我的代码 - 我添加了一些注释。

import bs4
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

def NodeRewardTime(link):
   req = Request(link,headers={'User-Agent': 'Mozilla/5.0'})
   webpage = urlopen(req).read()
   soup = bs4.BeautifulSoup(webpage, 'html5lib')  # pip install html5lib
   all_results = soup.select("#txaddr") # CSS selector for the entire table
   try:
       [print(x.text) for x in all_results] # prints results 
   except:
       print("No data to show")

link = "https://explorer.flitsnode.app/address/FieXP1irJKvmWUiqV18AFdDZD8bgWvfRiC/"

NodeRewardTime(link)
input("End")

输出:TimestampBlockHashAmount (FLS)Balance (FLS)TX Type [End]

您必须获取整行并用循环将其清除,以便在输出中只显示您需要的内容。

如果我们检查页面,您会看到数据是通过 this 站点以 JSON 格式加载的。

以下将以 table 格式打印数据:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import json


def NodeRewardTime(link):
    req = Request(link, headers={"User-Agent": "Mozilla/5.0"})
    webpage = urlopen(req).read()

    soup = BeautifulSoup(webpage, "html5lib")
    json_data = json.loads(soup.text)

    return "\n".join(" | ".join(i) for i in json_data["data"])

URL = "https://explorer.flitsnode.app/get_address_transactions?address=fiexp1irjkvmwuiqv18afddzd8bgwvfric"
print(NodeRewardTime(URL))

输出:

2020-08-14 00:00 | 562586 | cfc5fc6e81c0f31aaac85c2e3e6e727ce00cfdf4b938e7092472ce6f549b7fbf | 3.67999999 | 1003.67999999 | MASTERNODE
2020-08-13 16:37 | 562211 | 68f08eefef36aecd33645b13f3c95d0c3160ade5bc180b1f3b32ced670d97bef | -3.67999999 | 1000.00000000 | OUT
2020-08-12 18:58 | 561193 | 31958481f27f3d40ef5df4f437169f169f58b7b9556cc8ea5c381d4daf6d96b2 | 3.67999999 | 1003.67999999 | MASTERNODE
2020-08-11 22:00 | 560155 | 7ae289b8250fd94af10aa5e0a884149f548c7e3d1c6e05e7d78ac80284b3833a | -36.79999990 | 1000.00000000 | OUT
2020-08-11 15:02 | 559828 | 618185e5f12436e4c5fc97d45d36098ca56662780bbd037abfedfa316219571e | 3.67999999 | 1036.79999990 | MASTERNODE
2020-08-10 14:52 | 558579 | 3afeaa5e9e9130f03fac0303de680d790d075f1bbbae95e730bcf90fc33b82b9 | 3.67999999 | 1033.11999991 | MASTERNODE
2020-08-09 12:37 | 557281 | 0943156c88cc667502aef84b8143ba89f84cc069e342c86e028cae034abf3b36 | 3.67999999 | 1029.43999992 | MASTERNODE
2020-08-08 12:10 | 556044 | 31f56c608a02ae8f90b0e113dc60a4f35eec86b91c0be7242c4409bab2f4ece2 | 3.67999999 | 1025.75999993 | MASTERNODE
2020-08-07 09:07 | 554717 | 3e3e73db2491dec2071088a080a86567d769a6979c0304bfc26bfa194bfa8e5f | 3.67999999 | 1022.07999994 | MASTERNODE
2020-08-06 07:47 | 553471 | 92605aff1c7ee92302323b22ea4b2d812e71afa3e07be8a80e8a62d3f7281314 | 3.67999999 | 1018.39999995 | MASTERNODE
2020-08-05 04:47 | 552123 | 286261dc57262a2d2e34e1e3fd8c008946d6a08cf8a00617b2b66c14af3f2a82 | 3.67999999 | 1014.71999996 | MASTERNODE
2020-08-04 02:14 | 550794 | ccc75788a0b2c1b441fe9f2c3594c39ce9dcc90583112d795fd3666942c0014d | 3.67999999 | 1011.03999997 | MASTERNODE
2020-08-02 22:32 | 549388 | d2587f7a8adf268b881a22cf8b441382093916a95ab1c9f2f91c8a0ce59a281b | 3.67999999 | 1007.35999998 | MASTERNODE
2020-08-01 23:04 | 548196 | 1279fada75e56f2397288ce9eb4fcc7d04d10b15ea646189df75a117a2585707 | 3.67999999 | 1003.67999999 | MASTERNODE
... and on