将 MultiIndex pandas DataFrame 转换为嵌套 JSON

Convert a MultiIndex pandas DataFrame to a nested JSON

我在 pandas 中有以下带有 MultiIndex 行的 Dataframe。

               time    available_slots       status
month day                                          
1     1    10:00:00                  1    AVAILABLE
      1    12:00:00                  1    AVAILABLE
      1    14:00:00                  1    AVAILABLE
      1    16:00:00                  1    AVAILABLE
      1    18:00:00                  1    AVAILABLE
      2    10:00:00                  1    AVAILABLE
...             ...                ...          ...
2     28   12:00:00                  1    AVAILABLE
      28   14:00:00                  1    AVAILABLE
      28   16:00:00                  1    AVAILABLE
      28   18:00:00                  1    AVAILABLE
      28   20:00:00                  1    AVAILABLE

我需要将其转换为分层嵌套 JSON,如下所示:

[
    {
        "month": 1,
        "days": [
            {
                "day": 1,
                "slots": [
                    {
                        "time": "10:00:00",
                        "available_slots": 1,
                        "status": "AVAILABLE"
                    },
                    {
                        "time": "12:00:00",
                        "available_slots": 1,
                        "status": "AVAILABLE"
                    },
                    ...
                ]
            },
            {
                "day": 2,
                "slots": [
                    ...
                ]
            }
        ]
    },
    {
        "month": 2,
        "days":[
            {
                "day": 1,
                "slots": [
                    ...
                ]
            }
        ]
    },
    ...
]

不幸的是,这并不像 df.to_json(orient="index") 那样容易。

有谁知道 pandas 中是否有执行这种转换的方法?或者我可以通过什么方式遍历 DataFrame 来构建最终对象?

这是一种方法。基本上重复 groupby + apply(to_dict) + reset_index 直到我们得到想要的形状:

out = (df.groupby(level=[0,1])
       .apply(lambda x: x.to_dict('records'))
       .reset_index()
       .rename(columns={0:'slots'})
       .groupby('month')
       .apply(lambda x: x[['day','slots']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'days'})
       .to_json(orient='records', indent=True)
      )

输出:

[
 {
  "month":1,
  "days":[
   {
    "day":1,
    "slots":[
     {
      "time":"10:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"12:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"14:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"16:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"18:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     }
    ]
   },
   {
    "day":2,
    "slots":[
     {
      "time":"10:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     }
    ]
   }
  ]
 },
 {
  "month":2,
  "days":[
   {
    "day":28,
    "slots":[
     {
      "time":"12:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"14:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"16:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"18:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     },
     {
      "time":"20:00:00",
      "available_slots":1,
      "status":"AVAILABLE"
     }
    ]
   }
  ]
 }
]

您可以为索引的每个级别使用双循环:

data = []
for month, df1 in df.groupby(level=0):
    data.append({'month': month, 'days': []})
    for day, df2 in df1.groupby(level=1):
        data[-1]['days'].append({'day': day, 'slots': df2.to_dict('records')})

输出:

import json
print(json.dumps(data, indent=2))

[
  {
    "month": 1,
    "days": [
      {
        "day": 1,
        "slots": [
          {
            "time": "10:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "12:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "14:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "16:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "18:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          }
        ]
      },
      {
        "day": 2,
        "slots": [
          {
            "time": "10:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          }
        ]
      }
    ]
  },
  {
    "month": 2,
    "days": [
      {
        "day": 28,
        "slots": [
          {
            "time": "12:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "14:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "18:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          },
          {
            "time": "20:00:00",
            "available_slots": 1,
            "status": "AVAILABLE"
          }
        ]
      }
    ]
  }
]