从深度嵌套 json 创建 Dataframe

Create Dataframe from deeply nested json

我正在尝试阅读下面的 json 模式到数据框,我可以通过遍历所有节点将其转换为我喜欢的类型,但这可能需要一段时间,因为原始 json 文件要长得多然后这个例子。 (以万计)

    "data": [
        {
            "node": {
                "id": 2,
                "title": "Berserk",
                "main_picture": {
                    "medium": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/1\/157897.jpg",
                    "large": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/1\/157897l.jpg"
                },
                "alternative_titles": {
                    "synonyms": [
                        "Berserk: The Prototype"
                    ],
                    "en": "Berserk",
                    "ja": "\u30d9\u30eb\u30bb\u30eb\u30af"
                },
                "start_date": "1989-08-25",
                "end_date": "2021-09-10",
                "synopsis": "Guts, a former mercenary now known as the \"Black Swordsman,\" is out for revenge. After a tumultuous childhood, he finally finds someone he respects and believes he can trust, only to have everything fall apart when this person takes away everything important to Guts for the purpose of fulfilling his own desires. Now marked for death, Guts becomes condemned to a fate in which he is relentlessly pursued by demonic beings.\n\nSetting out on a dreadful quest riddled with misfortune, Guts, armed with a massive sword and monstrous strength, will let nothing stop him, not even death itself, until he is finally able to take the head of the one who stripped him\u2014and his loved one\u2014of their humanity.\n\n[Written by MAL Rewrite]\n\nIncluded one-shot:\nVolume 14: Berserk: The Prototype",
                "mean": 9.42,
                "rank": 1,
                "popularity": 2,
                "num_list_users": 468103,
                "num_scoring_users": 223269,
                "nsfw": "white",
                "created_at": "1970-01-01T00:00:00+00:00",
                "updated_at": "2021-12-03T19:02:32+00:00",
                "media_type": "manga",
                "status": "on_hiatus",
                "genres": [
                    {
                        "id": 1,
                        "name": "Action"
                    },
                    {
                        "id": 2,
                        "name": "Adventure"
                    },
                    {
                        "id": 10,
                        "name": "Fantasy"
                    },
                    {
                        "id": 40,
                        "name": "Psychological"
                    },
                    {
                        "id": 37,
                        "name": "Supernatural"
                    },
                    {
                        "id": 38,
                        "name": "Military"
                    },
                    {
                        "id": 6,
                        "name": "Demons"
                    },
                    {
                        "id": 8,
                        "name": "Drama"
                    },
                    {
                        "id": 14,
                        "name": "Horror"
                    },
                    {
                        "id": 41,
                        "name": "Seinen"
                    }
                ],
                "my_list_status": {
                    "status": "completed",
                    "is_rereading": false,
                    "num_volumes_read": 0,
                    "num_chapters_read": 0,
                    "score": 9,
                    "updated_at": "2020-05-29T15:13:01+00:00"
                },
                "num_volumes": 41,
                "num_chapters": 380,
                "authors": [
                    {
                        "node": {
                            "id": 1868,
                            "first_name": "Kentarou",
                            "last_name": "Miura"
                        },
                        "role": "Story & Art"
                    }
                ]
            },
            "ranking": {
                "rank": 1
            }
        },
        {
            "node": {
                "id": 1706,
                "title": "JoJo no Kimyou na Bouken Part 7: Steel Ball Run",
                "main_picture": {
                    "medium": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/3\/179882.jpg",
                    "large": "https:\/\/api-cdn.myanimelist.net\/images\/manga\/3\/179882l.jpg"
                },
                "alternative_titles": {
                    "synonyms": [
                        "JoJo's Bizarre Adventure Part 7: Steel Ball Run",
                        "SBR"
                    ],
                    "en": "",
                    "ja": "\u30b8\u30e7\u30b8\u30e7\u306e\u5947\u5999\u306a\u5192\u967a Part7 STEEL BALL RUN"
                },
                "start_date": "2004-01-19",
                "end_date": "2011-04-19",
                "synopsis": "In the American Old West, the world's greatest race is about to begin. Thousands line up in San Diego to travel over six thousand kilometers for a chance to win the grand prize of fifty million dollars. With the era of the horse reaching its end, contestants are allowed to use any kind of vehicle they wish. Competitors will have to endure grueling conditions, traveling up to a hundred kilometers a day through uncharted wastelands. The Steel Ball Run is truly a one-of-a-kind event.\n\nThe youthful Johnny Joestar, a crippled former horse racer, has come to San Diego to watch the start of the race. There he encounters Gyro Zeppeli, a racer with two steel balls at his waist instead of a gun. Johnny witnesses Gyro using one of his steel balls to unleash a fantastical power, compelling a man to fire his gun at himself during a duel. In the midst of the action, Johnny happens to touch the steel ball and feels a power surging through his legs, allowing him to stand up for the first time in two years. Vowing to find the secret of the steel balls, Johnny decides to compete in the race, and so begins his bizarre adventure across America on the Steel Ball Run.\n\n[Written by MAL Rewrite]",
                "mean": 9.25,
                "rank": 2,
                "popularity": 27,
                "num_list_users": 178230,
                "num_scoring_users": 105411,
                "nsfw": "white",
                "created_at": "1970-01-01T00:00:00+00:00",
                "updated_at": "2021-12-03T18:56:19+00:00",
                "media_type": "manga",
                "status": "finished",
                "genres": [
                    {
                        "id": 1,
                        "name": "Action"
                    },
                    {
                        "id": 2,
                        "name": "Adventure"
                    },
                    {
                        "id": 37,
                        "name": "Supernatural"
                    },
                    {
                        "id": 13,
                        "name": "Historical"
                    },
                    {
                        "id": 7,
                        "name": "Mystery"
                    },
                    {
                        "id": 14,
                        "name": "Horror"
                    },
                    {
                        "id": 27,
                        "name": "Shounen"
                    },
                    {
                        "id": 41,
                        "name": "Seinen"
                    }
                ],
                "num_volumes": 24,
                "num_chapters": 96,
                "authors": [
                    {
                        "node": {
                            "id": 2619,
                            "first_name": "Hirohiko",
                            "last_name": "Araki"
                        },
                        "role": "Story & Art"
                    }
                ]
            },
            "ranking": {
                "rank": 2
            }
        }
    ],
    "paging": {
        "next": "https:\/\/api.myanimelist.net\/v2\/manga\/ranking?offset=2&ranking_type=manga&limit=2&fields=id%2Ctitle%2Cmain_picture%2Calternative_titles%2Cstart_date%2Cend_date%2Csynopsis%2Cmean%2Crank%2Cpopularity%2Cnum_list_users%2Cnum_scoring_users%2Cnsfw%2Ccreated_at%2Cupdated_at%2Cmedia_type%2Cstatus%2Cgenres%2Cmy_list_status%2Cnum_volumes%2Cnum_chapters%2Cauthors%7Bfirst_name%2Clast_name%7D%2Cpictures%2Cbackground%2Crelated_anime%2Crelated_manga%2Crecommendations"
    }
}

让我纠结的专栏是流派和作者。我只想要流派列中的名称字段,以及作者列中的 first_name、last_name 列。

我想要的格式都是

   genres                             first_name    last_name            
0  Action, Adventure, Fantasy,        Kentarou      Miura
   Psychological, Supernatural ...

1  Action, Adventure, Supernatural,   Hirohiko      Araki
   Historical, Mystery, Horror ...

我目前的代码是这样的

import pandas as pd
import json
with open('./response.json') as file:
    data = json.load(file)

df = pd.json_normalize(data['data'])

此代码以这种格式提供流派和作者列

                                         node.genres                                       node.authors
0  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 1868, 'first_name': 'Kentarou...
1  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 2619, 'first_name': 'Hirohiko...
2  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 1881, 'first_name': 'Eiichiro...
3  [{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...  [{'node': {'id': 1911, 'first_name': 'Takehiko...
4  [{'id': 8, 'name': 'Drama'}, {'id': 7, 'name':...  [{'node': {'id': 1867, 'first_name': 'Naoki', ...

我想知道,如果我可以使用无需遍历所有行即可获得我喜欢的类型的技巧。

构建作者和流派的个人 DataFrame 并加入原始 df:

authors = pd.json_normalize(data["data"], record_path=["node",["authors"]])[["node.first_name", "node.last_name"]]
genres = pd.Series([", ".join(g["name"] for g in d) for d in df["node.genres"].tolist()]).rename("genres")
output = df.join(authors.join(genres))

或者,您可以执行以下任一操作。在有限的测试中,在 str 方法上使用 apply() 是最快的。

选项 1

df['genres'] = df.apply(lambda x: ', '.join([d['name'] for d in x['node.genres']]), axis=1)
df['first_name'] = df['node.authors'].str[0].str['node'].str['first_name']
df['last_name'] = df['node.authors'].str[0].str['node'].str['last_name']

选项 2

df['genres'] = df.apply(lambda x: ', '.join([d['name'] for d in x['node.genres']]), axis=1)
df['first_name'] = df.apply(lambda x: x['node.authors'][0]['node']['first_name'], axis=1)
df['last_name'] = df.apply(lambda x: x['node.authors'][0]['node']['last_name'], axis=1)

修改选项 2(最快)

df['genres'] = df.apply(lambda x: ', '.join([d['name'] for d in x['node.genres']]), axis=1)
df [['first_name', 'last_name']] = df.apply(lambda x: (x['node.authors'][0]['node']['first_name'], x['node.authors'][0]['node']['last_name']), axis=1)

时间结果:

answer 1 from not_speshal: 
1.94 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

My options in order:
1.45 ms ± 45.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.35 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
925 µs ± 6.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)