使用 python 和 pandas 将多数组 json 数据转换为扁平数据帧

Transforming a multi array json data into a flatterned dataframe with python and pandas

我得到一个排列的多级 JSON 响应,其中一个级别是日期,另一个级别从 API 复制较低级别数组中的数据,如下所示:

{
   "2021-11-04": {
      "40-41 (25-27)": {
         "sales": 26,
         "balance": 480,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 63,
         "balance": 817,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   },
   "2021-11-05": {
      "40-41 (25-27)": {
         "sales": 35,
         "balance": 445,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 95,
         "balance": 725,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   }
}

但我需要的是使它不是一个数组,而是一个展平对象,以便使用 pandas 轻松形成数据框。怎么做到的?

需要的结果:

{
  { 
    "day": "2021-11-04",
    "sales": 26,
    "balance": 480,
    "size_name": "40-41",
    "size_origin": "25-27"
  },
  {
    "day": "2021-11-04",
    "sales": 63,
    "balance": 817,
    "size_name": "42-43",
    "size_origin": "27-29"
   },
   { 
    "day": "2021-11-05",
    "sales": 35,
    "balance": 445,
    "size_name": "40-41",
    "size_origin": "25-27"
  },
  {
    "day": "2021-11-05",
    "sales": 95,
    "balance": 725,
    "size_name": "42-43",
    "size_origin": "27-29"
   }
}

我可以在 pandas 中转换它,而不是在 JSON 格式中,但我仍然不明白如何转换这种结构。

你可以用一个简单的双循环来完成

data = {
   "2021-11-04": {
      "40-41 (25-27)": {
         "sales": 26,
         "balance": 480,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 63,
         "balance": 817,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   },
   "2021-11-05": {
      "40-41 (25-27)": {
         "sales": 35,
         "balance": 445,
         "size_name": "40-41",
         "size_origin": "25-27"
      },
      "42-43 (27-29)": {
         "sales": 95,
         "balance": 725,
         "size_name": "42-43",
         "size_origin": "27-29"
      }
   }
}

records = []
for date, date_dict in data.items():
    for rec_id, rec in date_dict.items():
        rec['day'] = date 
        records.append(rec)

输出

>>> records

[{'sales': 26,
  'balance': 480,
  'size_name': '40-41',
  'size_origin': '25-27',
  'day': '2021-11-04'},
 {'sales': 63,
  'balance': 817,
  'size_name': '42-43',
  'size_origin': '27-29',
  'day': '2021-11-04'},
 {'sales': 35,
  'balance': 445,
  'size_name': '40-41',
  'size_origin': '25-27',
  'day': '2021-11-05'},
 {'sales': 95,
  'balance': 725,
  'size_name': '42-43',
  'size_origin': '27-29',
  'day': '2021-11-05'}]

>>> pd.DataFrame(records)

   sales  balance size_name size_origin         day
0     26      480     40-41       25-27  2021-11-04
1     63      817     42-43       27-29  2021-11-04
2     35      445     40-41       25-27  2021-11-05
3     95      725     42-43       27-29  2021-11-05