如何在已经展平的数据框中展平一列嵌套 json 对象

How to flatten a column of nested json objects in an already flattened dataframe

我有一个包含嵌套对象的 json 文件,该文件在 pandas 数据框中展平。有一个包含嵌套 json 对象的列,我发现很难展平。

我尝试了很多方法,这是让我走得最远的方法。

不胜感激,谢谢。

不幸的是,我无法找到类似 jsfiddle 的 python 替代方案来提供工作示例。

我知道使用 json_normalize 的元参数我可以向我的数据框添加列。但是这种方法在不平坦的列上不起作用,因为我通过将 record_path 设置为 'markets' 来使 json_normalize 在我的设置中运行良好,这是我的主要 json 对象文件。因此,在此设置中,我无法 record_path 到 'marketStats' 并通过元参数添加任何相关列。

目标

目标是将 marketStats 对象中的一个或所有 json 个对象转换为数据框的列。

代码

with open('Data/20012022.json') as file:
data = json.loads(file.read())

# Flatten data
df0 = pd.json_normalize(
      data, 
      record_path =['markets']
)

df0.head(3)

截图

这是 table 当前的屏幕截图,marketStats 列包含嵌套的 json。

数据

这是来自 json 文件的片段。 `

{
  "markets": [
    {
      "id": 335,
      "baseCurrency": "eth",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-ETH",
      "marketName": "btc-eth",
      "symbol": "ETHBTC",
      "volume": "40624.5823",
      "quoteVolume": "3026.13646935",
      "btcVolume": "3026.13646935",
      "usdVolume": "127009429.050524367",
      "currentPrice": 0.074681,
      "latestBase": {
        "id": 161774475,
        "time": 1639576800,
        "date": "2021-12-15T14:00:00.000+00:00",
        "price": "0.077653",
        "lowestPrice": "0.0729",
        "bounce": "6.283",
        "currentDrop": "-3.8272829124438206",
        "crackedAt": "2022-01-07T03:00:00.000Z",
        "respectedAt": "2022-01-15T15:00:00.000Z",
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "50.0",
          "medianDrop": "-4.08",
          "medianBounce": "5.51",
          "hoursToRespected": 106,
          "crackedCount": 2,
          "respectedCount": 1
        },
        {
          "algorithm": "day_trade",
          "ratio": "100.0",
          "medianDrop": "-6.12",
          "medianBounce": "6.28",
          "hoursToRespected": 204,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "conservative",
          "ratio": "100.0",
          "medianDrop": "-6.12",
          "medianBounce": "8.38",
          "hoursToRespected": 204,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "position",
          "ratio": "50.0",
          "medianDrop": "-6.12",
          "medianBounce": "6.19",
          "hoursToRespected": 204,
          "crackedCount": 2,
          "respectedCount": 1
        },
        {
          "algorithm": "hodloo",
          "ratio": "50.0",
          "medianDrop": "-3.29",
          "medianBounce": "0.0",
          "hoursToRespected": 225,
          "crackedCount": 4,
          "respectedCount": 2
        }
      ]
    },
    {
      "id": 337,
      "baseCurrency": "ltc",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-LTC",
      "marketName": "btc-ltc",
      "symbol": "LTCBTC",
      "volume": "68309.637",
      "quoteVolume": "223.79294524",
      "btcVolume": "223.79294524",
      "usdVolume": "9392773.4219378968",
      "currentPrice": 0.003275,
      "latestBase": {
        "id": 163982984,
        "time": 1642374000,
        "date": "2022-01-16T23:00:00.000+00:00",
        "price": "0.003346",
        "lowestPrice": "0.00322",
        "bounce": "3.839",
        "currentDrop": "-2.1219366407650926",
        "crackedAt": "2022-01-18T23:00:00.000Z",
        "respectedAt": null,
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "57.14",
          "medianDrop": "-3.28",
          "medianBounce": "3.84",
          "hoursToRespected": 186,
          "crackedCount": 7,
          "respectedCount": 4
        },
        {
          "algorithm": "day_trade",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "5.68",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "conservative",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "5.68",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "position",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "8.16",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "hodloo",
          "ratio": "75.0",
          "medianDrop": "-3.7",
          "medianBounce": "0.0",
          "hoursToRespected": 35,
          "crackedCount": 4,
          "respectedCount": 3
        }
      ]
    },
    {
      "id": 339,
      "baseCurrency": "bnb",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-BNB",
      "marketName": "btc-bnb",
      "symbol": "BNBBTC",
      "volume": "154576.177",
      "quoteVolume": "1724.66664804",
      "btcVolume": "1724.66664804",
      "usdVolume": "72385673.4448901928",
      "currentPrice": 0.01099,
      "latestBase": {
        "id": 163753765,
        "time": 1642068000,
        "date": "2022-01-13T10:00:00.000+00:00",
        "price": "0.01093",
        "lowestPrice": "0.01093",
        "bounce": "3.102",
        "currentDrop": "0.5489478499542543",
        "crackedAt": null,
        "respectedAt": null,
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "100.0",
          "medianDrop": "-7.18",
          "medianBounce": "4.34",
          "hoursToRespected": 62,
          "crackedCount": 2,
          "respectedCount": 2
        },
        {
          "algorithm": "day_trade",
          "ratio": "100.0",
          "medianDrop": "-6.19",
          "medianBounce": "4.3",
          "hoursToRespected": 63,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "conservative",
          "ratio": "66.67",
          "medianDrop": "-3.15",
          "medianBounce": "4.05",
          "hoursToRespected": 62,
          "crackedCount": 3,
          "respectedCount": 2
        },
        {
          "algorithm": "position",
          "ratio": "100.0",
          "medianDrop": "-3.15",
          "medianBounce": "4.46",
          "hoursToRespected": 60,
          "crackedCount": 2,
          "respectedCount": 2
        },
        {
          "algorithm": "hodloo",
          "ratio": "100.0",
          "medianDrop": "-7.46",
          "medianBounce": "0.0",
          "hoursToRespected": 62,
          "crackedCount": 5,
          "respectedCount": 5
        }
      ]
    }
  ]
}

您可以将一些 post-processing 应用到 df0 来实现您想要的效果。在这里,您可以应用 explode,然后将 apply(pf.Series) 应用到 'marketStats' 列:

df1 = df0.explode('marketStats')['marketStats'].apply(pd.Series)

df1 看起来像这样:

    algorithm       ratio    medianDrop    medianBounce    hoursToRespected    crackedCount    respectedCount
--  ------------  -------  ------------  --------------  ------------------  --------------  ----------------
 0  original        50            -4.08            5.51                 106               2                 1
 0  day_trade      100            -6.12            6.28                 204               1                 1
 0  conservative   100            -6.12            8.38                 204               1                 1
 0  position        50            -6.12            6.19                 204               2                 1
 0  hodloo          50            -3.29            0                    225               4                 2
 1  original        57.14         -3.28            3.84                 186               7                 4
 1  day_trade        0             0               5.68                   0               1                 0
 1  conservative     0             0               5.68                   0               1                 0
 1  position         0             0               8.16                   0               1                 0
 1  hodloo          75            -3.7             0                     35               4                 3
 2  original       100            -7.18            4.34                  62               2                 2
 2  day_trade      100            -6.19            4.3                   63               1                 1
 2  conservative    66.67         -3.15            4.05                  62               3                 2
 2  position       100            -3.15            4.46                  60               2                 2
 2  hodloo         100            -7.46            0                     62               5                 5

如果您希望它与所有其他列结合使用,您可以使用 join:

df0.join(df1)

我不会post这个命令的输出,因为它相当大