解析 JSON 中的嵌套数组 Python,保留到 Json 对象的映射
Parse JSON nested array in Python, keep the mapping to the Json object
我有一个很大的 JSON 文件,其结构如下:
{
"Project": {
"AAA": {
"Version": [
{
"id": "00001",
"name": "08.12.2019",
"description": null,
"released": true,
"releaseDate": "2019-08-12"
},
{
"id": "00002",
"name": "2019.8.26",
"description": null,
"released": true,
"releaseDate": "2019-08-26"
}
]
},
"BBB": {
"Version": [
{
"id": "00003",
"name": "AABBY3",
"description": "2019 Accounting Year End",
"released": false,
"releaseDate": null
},
{
"id": "00004",
"name": "AACCZ4",
"description": "Financial Statements 2019",
"released": false,
"releaseDate": null
},
{
"id": "00005",
"name": "AADDZ5",
"description": null,
"released": false,
"releaseDate": null
}
]
}
}
}
由于嵌套数组,我在将其转换为 Python 数据帧时遇到问题。如何为每个 Project
提取每个 Version
中的所有数据,但保留对 Project
的引用?
到目前为止,我只设法获得了以下结构的数据框:
df.head(3)
Out[10]:
description id name releaseDate released
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True
使用以下内容:
with open("fixVer2.json", "r") as read_file:
data = json.load(read_file)
prj_list = ['AAA', 'BBB', 'CCC', 'DDD']
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
d_list.append(el)
df = pd.DataFrame(d_list)
但是由于 releaseDates
的项目之间存在一些重复的 names
,我需要保留 Project
名称以便为每个 [=] 识别正确的 releaseDate
24=]
期望的输出:
description id name releaseDate released Project
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True CCC
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True CCC
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True CCC
我不确定如何解析嵌套数组,保留 Project
名称详细信息并将其全部合并为一个 dataframe/other Python 结构
您可以在您的解决方案中更改附加版本:
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
el['Project'] = x
d_list.append(el)
或使用列表理解:
prj_list = ['AAA', 'BBB']
d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
id name description released releaseDate version
0 00001 08.12.2019 null True 2019-08-12 AAA
1 00002 2019.8.26 null True 2019-08-26 AAA
2 00003 AABBY3 2019 Accounting Year End False null BBB
3 00004 AACCZ4 Financial Statements 2019 False null BBB
4 00005 AADDZ5 null False null BBB
试试这个:
import json
import pandas as pd
with open("test.json", "r") as read_file:
data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
for d in dat['Version']:
d['Project']=name
d_list.append(d)
df = pd.DataFrame(d_list)
print(df)
Project description id name releaseDate released
0 AAA None 00001 08.12.2019 2019-08-12 True
1 AAA None 00002 2019.8.26 2019-08-26 True
2 BBB 2019 Accounting Year End 00003 AABBY3 None False
3 BBB Financial Statements 2019 00004 AACCZ4 None False
4 BBB None 00005 AADDZ5 None False
使用这种方法,您无需保留单独的项目列表。希望这对您有所帮助!
我有一个很大的 JSON 文件,其结构如下:
{
"Project": {
"AAA": {
"Version": [
{
"id": "00001",
"name": "08.12.2019",
"description": null,
"released": true,
"releaseDate": "2019-08-12"
},
{
"id": "00002",
"name": "2019.8.26",
"description": null,
"released": true,
"releaseDate": "2019-08-26"
}
]
},
"BBB": {
"Version": [
{
"id": "00003",
"name": "AABBY3",
"description": "2019 Accounting Year End",
"released": false,
"releaseDate": null
},
{
"id": "00004",
"name": "AACCZ4",
"description": "Financial Statements 2019",
"released": false,
"releaseDate": null
},
{
"id": "00005",
"name": "AADDZ5",
"description": null,
"released": false,
"releaseDate": null
}
]
}
}
}
由于嵌套数组,我在将其转换为 Python 数据帧时遇到问题。如何为每个 Project
提取每个 Version
中的所有数据,但保留对 Project
的引用?
到目前为止,我只设法获得了以下结构的数据框:
df.head(3)
Out[10]:
description id name releaseDate released
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True
使用以下内容:
with open("fixVer2.json", "r") as read_file:
data = json.load(read_file)
prj_list = ['AAA', 'BBB', 'CCC', 'DDD']
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
d_list.append(el)
df = pd.DataFrame(d_list)
但是由于 releaseDates
的项目之间存在一些重复的 names
,我需要保留 Project
名称以便为每个 [=] 识别正确的 releaseDate
24=]
期望的输出:
description id name releaseDate released Project
0 Version 5.4.1. 10703 V5R4M1 2010-09-15 True CCC
1 Version 5.5.1 10704 V5R5M1 2015-04-20 True CCC
2 Version 6.1.1 10705 V6R1M1 2016-10-14 True CCC
我不确定如何解析嵌套数组,保留 Project
名称详细信息并将其全部合并为一个 dataframe/other Python 结构
您可以在您的解决方案中更改附加版本:
d_list = []
for x in prj_list:
d = data['Project'][x]['Version']
for el in d:
el['Project'] = x
d_list.append(el)
或使用列表理解:
prj_list = ['AAA', 'BBB']
d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
id name description released releaseDate version
0 00001 08.12.2019 null True 2019-08-12 AAA
1 00002 2019.8.26 null True 2019-08-26 AAA
2 00003 AABBY3 2019 Accounting Year End False null BBB
3 00004 AACCZ4 Financial Statements 2019 False null BBB
4 00005 AADDZ5 null False null BBB
试试这个:
import json
import pandas as pd
with open("test.json", "r") as read_file:
data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
for d in dat['Version']:
d['Project']=name
d_list.append(d)
df = pd.DataFrame(d_list)
print(df)
Project description id name releaseDate released
0 AAA None 00001 08.12.2019 2019-08-12 True
1 AAA None 00002 2019.8.26 2019-08-26 True
2 BBB 2019 Accounting Year End 00003 AABBY3 None False
3 BBB Financial Statements 2019 00004 AACCZ4 None False
4 BBB None 00005 AADDZ5 None False
使用这种方法,您无需保留单独的项目列表。希望这对您有所帮助!