嵌套 JSON 解压到 PD 数据帧

Nested JSON unpack into PD dataframe

我想就我拥有的这个 .json 文件寻求帮助。 我已经广泛查看了 pd.json_normalize() 方法,但无法正确设置格式。

我开始试验的代码行是这样的 ''' result_df = pd.json_normalize(cgcryptohistory_data) '''

我很想将我的 json 格式化为格式如下的 df:

date bitcoin prices bitcoin market_caps bitcoin total_volumes ethereum prices ethereum market_caps ethereum total_volumes
1637920962758 55084.24409740329 1040185692035.8112 4096.986983019884 ... ...
1637924583096 ... ... ... ... ... ...

我一直在查看此文档,但无法使其与未命名的嵌套值一起使用。 https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html https://www.kaggle.com/jboysen/quick-tutorial-flatten-nested-json-in-pandas/notebook

[
  [
    {
      "crypto": "bitcoin"
    }
  ],
  {
    "prices": [
      [
        1637920962758,
        55084.24409740329
      ],
      [
        1637924583096,
        54657.9826454445
      ],
      [
        1637928143387,
        54031.99796233907
      ],
      [
        1638524408000,
        56556.355173823926
      ]
    ],
    "market_caps": [
      [
        1637920962758,
        1040185692035.8112
      ],
      [
        1637924583096,
        1032137732028.0712
      ],
      [
        1637928143387,
        1020318960913.6139
      ],
      [
        1638524408000,
        1068341065780.2579
      ]
    ],
    "total_volumes": [
      [
        1637920962758,
        40002799175.46155
      ],
      [
        1637924583096,
        38579701553.8867
      ],
      [
        1637928143387,
        39373185822.85809
      ],
      [
        1638524408000,
        32567680716.236423
      ]
    ]
  },
  [
    {
      "crypto": "ethereum"
    }
  ],
  {
    "prices": [
      [
        1637920951704,
        4096.986983019884
      ],
      [
        1637924408082,
        4072.6963895955864
      ],
      [
        1637928090810,
        4021.2930336538925
      ],
      [
        1638524390000,
        4559.839444343959
      ]
    ],
    "market_caps": [
      [
        1637920951704,
        485474079335.9266
      ],
      [
        1637924408082,
        482758573953.61304
      ],
      [
        1637928090810,
        479260985689.3548
      ],
      [
        1638524390000,
        540740261905.95264
      ]
    ],
    "total_volumes": [
      [
        1637920951704,
        25972933719.35031
      ],
      [
        1637924408082,
        26468521371.13646
      ],
      [
        1637928090810,
        27042124946.11916
      ],
      [
        1638524390000,
        20268892519.524815
      ]
    ]
  }
]

假设 js 是您的 json,这就是我的做法。

l = []
for i in range(0,len(js),2):
    prices = [k[1] for k in js[i+1]["prices"]]
    market_caps = [k[1] for k in js[i+1]["market_caps"]]
    total_volumes = [k[1] for k in js[i+1]["total_volumes"]]
    date =  [k[0] for k in js[i+1]["total_volumes"]]
    crypto =  js[i][0]["crypto"]
    df = pd.DataFrame({"crypto":crypto,"prices":prices,"market_caps":market_caps,"total_volumes":total_volumes,"date":date})
    l.append(df)
df = pd.concat(l)

输出:

     crypto        prices   market_caps  total_volumes           date
0   bitcoin  55084.244097  1.040186e+12   4.000280e+10  1637920962758
1   bitcoin  54657.982645  1.032138e+12   3.857970e+10  1637924583096
2   bitcoin  54031.997962  1.020319e+12   3.937319e+10  1637928143387
3   bitcoin  56556.355174  1.068341e+12   3.256768e+10  1638524408000
0  ethereum   4096.986983  4.854741e+11   2.597293e+10  1637920951704
1  ethereum   4072.696390  4.827586e+11   2.646852e+10  1637924408082
2  ethereum   4021.293034  4.792610e+11   2.704212e+10  1637928090810
3  ethereum   4559.839444  5.407403e+11   2.026889e+10  1638524390000

这样它的可扩展性更强,你可以像这样过滤你想要的加密货币:

df[df.crypto == "bitcoin"]

输出

    crypto        prices   market_caps  total_volumes           date
0  bitcoin  55084.244097  1.040186e+12   4.000280e+10  1637920962758
1  bitcoin  54657.982645  1.032138e+12   3.857970e+10  1637924583096
2  bitcoin  54031.997962  1.020319e+12   3.937319e+10  1637928143387
3  bitcoin  56556.355174  1.068341e+12   3.256768e+10  1638524408000