从 pandas 数据框以自定义格式嵌套 JSON,并添加了标签

Nested JSON in customised format from pandas Dataframe, with added label

数据框

df = {"UNIT":["UNIT1","UNIT1","UNIT2","UNIT2"],
"PROJECT":["A","A","C","C"],
"TEAM":[1,2,1,2],
"NAME":["FANNY", "KATY", "PERCY", "PETER"],
"ID":[123,234,333,222]}
data = pd.DataFrame(df)

    UNIT PROJECT  TEAM   NAME   ID
0  UNIT1       A     1  FANNY  123
1  UNIT1       A     2   KATY  234
2  UNIT2       C     1  PERCY  333
3  UNIT2       C     2  PETER  222

预期输出

[
    {
        "UNIT": "UNIT1",
        "PROJECT": "A",
        "TEAM_DETAIL": [
            {
                "TEAM": 1,
                "MEMBER": [
                    {
                        "NAME": "FANNY",
                        "ID": 123
                    }
                ]
            },
            {
                "TEAM": "TEAM 2",
                "MEMBER": [
                    {
                        "NAME": "KATY",
                        "ID": 234
                    }
                ]
            }
        ]
    },
    {
        "UNIT": "UNIT2",
        "PROJECT": "C",
        "TEAM_DETAIL": [
            {
                "TEAM": 1,
                "MEMBER": [
                    {
                        "NAME": "PERCY",
                        "ID": 333
                    }
                ]
            },
            {
                "TEAM": "TEAM 2",
                "MEMBER": [
                    {
                        "NAME": "PETER",
                        "ID": 222
                    }
                ]
            }
        ]
    }
]

在这种情况下,我想按 TEAM 对数据进行分组,从而显示每个团队中每个成员的详细信息。 不添加自定义标签,例如 TEAM_DETAILMEMBER, 使用 .to_dict() 可以轻松实现 但是,我不知道如何在每个级别上添加标签。

您必须使用第一个 groupby 创建 MEMBER 列表。然后您可以使用第二个 groupby 创建 TEAM_DETAIL 列表。

完整代码:

import pandas as pd

data = {"UNIT":["UNIT1","UNIT1","UNIT2","UNIT2"],
"PROJECT":["A","A","C","C"],
"TEAM":[1,2,1,2],
"NAME":["FANNY", "KATY", "PERCY", "PETER"],
"ID":[123,234,333,222]}
df = pd.DataFrame(data)
df

json = (df.groupby(['UNIT','PROJECT', 'TEAM'])
       .apply(lambda x: x[['NAME','ID']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'MEMBER'})
       .groupby(['UNIT','PROJECT'])
       .apply(lambda x: x[['TEAM','MEMBER']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'TEAM_DETAIL'})
       .to_json(orient='records'))
     
print(json)

输出:

'[{"UNIT":"UNIT1","PROJECT":"A","TEAM_DETAIL":[{"TEAM":1,"MEMBER":[{"NAME":"FANNY","ID":123}]},{"TEAM":2,"MEMBER":[{"NAME":"KATY","ID":234}]}]},{"UNIT":"UNIT2","PROJECT":"C","TEAM_DETAIL":[{"TEAM":1,"MEMBER":[{"NAME":"PERCY","ID":333}]},{"TEAM":2,"MEMBER":[{"NAME":"PETER","ID":222}]}]}]'