解析 JSON 中的嵌套数组 Python，保留到 Json 对象的映射

Question

我有一个很大的 JSON 文件，其结构如下：

    {
    "Project": {
        "AAA": {
            "Version": [
                {
                    "id": "00001",
                    "name": "08.12.2019",
                    "description": null,
                    "released": true,
                    "releaseDate": "2019-08-12"
                },
                {
                    "id": "00002",
                    "name": "2019.8.26",
                    "description": null,
                    "released": true,
                    "releaseDate": "2019-08-26"
                }
            ]
        },
        "BBB": {
            "Version": [
                {
                    "id": "00003",
                    "name": "AABBY3",
                    "description": "2019 Accounting Year End",
                    "released": false,
                    "releaseDate": null
                },
                {
                    "id": "00004",
                    "name": "AACCZ4",
                    "description": "Financial Statements 2019",
                    "released": false,
                    "releaseDate": null
                },
                {
                    "id": "00005",
                    "name": "AADDZ5",
                    "description": null,
                    "released": false,
                    "releaseDate": null
                }
            ]
        }
    }
}

由于嵌套数组，我在将其转换为 Python 数据帧时遇到问题。如何为每个 Project 提取每个 Version 中的所有数据，但保留对 Project 的引用？

到目前为止，我只设法获得了以下结构的数据框：

df.head(3)
Out[10]: 
      description     id    name releaseDate  released
0  Version 5.4.1.  10703  V5R4M1  2010-09-15      True
1   Version 5.5.1  10704  V5R5M1  2015-04-20      True
2   Version 6.1.1  10705  V6R1M1  2016-10-14      True

使用以下内容：

with open("fixVer2.json", "r") as read_file:
    data = json.load(read_file)

prj_list = ['AAA', 'BBB', 'CCC', 'DDD']

d_list = []
for x in prj_list:
    d = data['Project'][x]['Version']
    for el in d:
        d_list.append(el)

df = pd.DataFrame(d_list)

但是由于 releaseDates 的项目之间存在一些重复的 names，我需要保留 Project 名称以便为每个 [=] 识别正确的 releaseDate 24=]

期望的输出：

      description     id    name releaseDate  released  Project
0  Version 5.4.1.  10703  V5R4M1  2010-09-15      True  CCC
1   Version 5.5.1  10704  V5R5M1  2015-04-20      True  CCC
2   Version 6.1.1  10705  V6R1M1  2016-10-14      True  CCC

我不确定如何解析嵌套数组，保留 Project 名称详细信息并将其全部合并为一个 dataframe/other Python 结构

Answer 1

您可以在您的解决方案中更改附加版本：

d_list = []
for x in prj_list:
    d = data['Project'][x]['Version']
    for el in d:
        el['Project'] = x
        d_list.append(el)

或使用列表理解：

prj_list = ['AAA', 'BBB']

d_list = [{**el, **{'version': x}} for x in prj_list for el in data['Project'][x]['Version']]
df = pd.DataFrame(d_list)
print (df)
      id        name                description  released releaseDate version
0  00001  08.12.2019                       null      True  2019-08-12     AAA
1  00002   2019.8.26                       null      True  2019-08-26     AAA
2  00003      AABBY3   2019 Accounting Year End     False        null     BBB
3  00004      AACCZ4  Financial Statements 2019     False        null     BBB
4  00005      AADDZ5                       null     False        null     BBB

Answer 2

试试这个：

import json
import pandas as pd

with open("test.json", "r") as read_file:
    data = json.load(read_file)['Project']
d_list = []
for name,dat in data.items():
    for d in dat['Version']:
        d['Project']=name
        d_list.append(d)
df = pd.DataFrame(d_list)
print(df)

Project                description     id        name releaseDate  released
0     AAA                       None  00001  08.12.2019  2019-08-12      True
1     AAA                       None  00002   2019.8.26  2019-08-26      True
2     BBB   2019 Accounting Year End  00003      AABBY3        None     False
3     BBB  Financial Statements 2019  00004      AACCZ4        None     False
4     BBB                       None  00005      AADDZ5        None     False

使用这种方法，您无需保留单独的项目列表。希望这对您有所帮助！

解析 JSON 中的嵌套数组 Python，保留到 Json 对象的映射

Parse JSON nested array in Python, keep the mapping to the Json object

python

json

pandas

jsonparser