使用 python 和 pandas 将多数组 json 数据转换为扁平数据帧
Transforming a multi array json data into a flatterned dataframe with python and pandas
我得到一个排列的多级 JSON 响应,其中一个级别是日期,另一个级别从 API 复制较低级别数组中的数据,如下所示:
{
"2021-11-04": {
"40-41 (25-27)": {
"sales": 26,
"balance": 480,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 63,
"balance": 817,
"size_name": "42-43",
"size_origin": "27-29"
}
},
"2021-11-05": {
"40-41 (25-27)": {
"sales": 35,
"balance": 445,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 95,
"balance": 725,
"size_name": "42-43",
"size_origin": "27-29"
}
}
}
但我需要的是使它不是一个数组,而是一个展平对象,以便使用 pandas 轻松形成数据框。怎么做到的?
需要的结果:
{
{
"day": "2021-11-04",
"sales": 26,
"balance": 480,
"size_name": "40-41",
"size_origin": "25-27"
},
{
"day": "2021-11-04",
"sales": 63,
"balance": 817,
"size_name": "42-43",
"size_origin": "27-29"
},
{
"day": "2021-11-05",
"sales": 35,
"balance": 445,
"size_name": "40-41",
"size_origin": "25-27"
},
{
"day": "2021-11-05",
"sales": 95,
"balance": 725,
"size_name": "42-43",
"size_origin": "27-29"
}
}
我可以在 pandas 中转换它,而不是在 JSON 格式中,但我仍然不明白如何转换这种结构。
你可以用一个简单的双循环来完成
data = {
"2021-11-04": {
"40-41 (25-27)": {
"sales": 26,
"balance": 480,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 63,
"balance": 817,
"size_name": "42-43",
"size_origin": "27-29"
}
},
"2021-11-05": {
"40-41 (25-27)": {
"sales": 35,
"balance": 445,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 95,
"balance": 725,
"size_name": "42-43",
"size_origin": "27-29"
}
}
}
records = []
for date, date_dict in data.items():
for rec_id, rec in date_dict.items():
rec['day'] = date
records.append(rec)
输出
>>> records
[{'sales': 26,
'balance': 480,
'size_name': '40-41',
'size_origin': '25-27',
'day': '2021-11-04'},
{'sales': 63,
'balance': 817,
'size_name': '42-43',
'size_origin': '27-29',
'day': '2021-11-04'},
{'sales': 35,
'balance': 445,
'size_name': '40-41',
'size_origin': '25-27',
'day': '2021-11-05'},
{'sales': 95,
'balance': 725,
'size_name': '42-43',
'size_origin': '27-29',
'day': '2021-11-05'}]
>>> pd.DataFrame(records)
sales balance size_name size_origin day
0 26 480 40-41 25-27 2021-11-04
1 63 817 42-43 27-29 2021-11-04
2 35 445 40-41 25-27 2021-11-05
3 95 725 42-43 27-29 2021-11-05
我得到一个排列的多级 JSON 响应,其中一个级别是日期,另一个级别从 API 复制较低级别数组中的数据,如下所示:
{
"2021-11-04": {
"40-41 (25-27)": {
"sales": 26,
"balance": 480,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 63,
"balance": 817,
"size_name": "42-43",
"size_origin": "27-29"
}
},
"2021-11-05": {
"40-41 (25-27)": {
"sales": 35,
"balance": 445,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 95,
"balance": 725,
"size_name": "42-43",
"size_origin": "27-29"
}
}
}
但我需要的是使它不是一个数组,而是一个展平对象,以便使用 pandas 轻松形成数据框。怎么做到的?
需要的结果:
{
{
"day": "2021-11-04",
"sales": 26,
"balance": 480,
"size_name": "40-41",
"size_origin": "25-27"
},
{
"day": "2021-11-04",
"sales": 63,
"balance": 817,
"size_name": "42-43",
"size_origin": "27-29"
},
{
"day": "2021-11-05",
"sales": 35,
"balance": 445,
"size_name": "40-41",
"size_origin": "25-27"
},
{
"day": "2021-11-05",
"sales": 95,
"balance": 725,
"size_name": "42-43",
"size_origin": "27-29"
}
}
我可以在 pandas 中转换它,而不是在 JSON 格式中,但我仍然不明白如何转换这种结构。
你可以用一个简单的双循环来完成
data = {
"2021-11-04": {
"40-41 (25-27)": {
"sales": 26,
"balance": 480,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 63,
"balance": 817,
"size_name": "42-43",
"size_origin": "27-29"
}
},
"2021-11-05": {
"40-41 (25-27)": {
"sales": 35,
"balance": 445,
"size_name": "40-41",
"size_origin": "25-27"
},
"42-43 (27-29)": {
"sales": 95,
"balance": 725,
"size_name": "42-43",
"size_origin": "27-29"
}
}
}
records = []
for date, date_dict in data.items():
for rec_id, rec in date_dict.items():
rec['day'] = date
records.append(rec)
输出
>>> records
[{'sales': 26,
'balance': 480,
'size_name': '40-41',
'size_origin': '25-27',
'day': '2021-11-04'},
{'sales': 63,
'balance': 817,
'size_name': '42-43',
'size_origin': '27-29',
'day': '2021-11-04'},
{'sales': 35,
'balance': 445,
'size_name': '40-41',
'size_origin': '25-27',
'day': '2021-11-05'},
{'sales': 95,
'balance': 725,
'size_name': '42-43',
'size_origin': '27-29',
'day': '2021-11-05'}]
>>> pd.DataFrame(records)
sales balance size_name size_origin day
0 26 480 40-41 25-27 2021-11-04
1 63 817 42-43 27-29 2021-11-04
2 35 445 40-41 25-27 2021-11-05
3 95 725 42-43 27-29 2021-11-05