如何将 XML 文件转换为 pandas 数据帧?

How do I convert an XML file to a pandas dataframe?

我正在尝试将 XML 文件转换为以下格式:

<ann>
  <anime id="24235" gid="2583955622" type="TV" name="Love After World Domination" precision="TV" generated-on="2021-04-06T00:15:25Z">
    <related-prev rel="adapted from" id="24234"/>
    <info gid="1661578035" type="Main title" lang="EN">Love After World Domination</info>
    <info gid="2103040388" type="Alternative title" lang="JA">Sekai Seifuku no Ato de</info>
    <info gid="2069464047" type="Alternative title" lang="JA">恋は世界征服のあとで</info>
    <staff gid="1364018953">
    ...
    </staff>
    <staff gid="2582001321">
    ...
    </staff>
  </anime>
  <manga id="24225" gid="1003998999" type="manga" name="She's My Knight" precision="manga" generated-on="2021-04-06T00:21:21Z">
    <info gid="2757138724" type="Picture" src="https://cdn.animenewsnetwork.com/thumbnails/fit200x200/encyc/A24225-2757138724.1617642733.jpg" width="140" height="200">
    ...
    </info>
    <info gid="1643119455" type="Main title" lang="EN">She's My Knight</info>
    <info gid="2475002983" type="Alternative title" lang="JA">Ikemen Kanojo to Heroine na Ore!?</info>
    <info gid="2034824415" type="Alternative title" lang="JA">イケメン彼女とヒロインな俺!?</info>
    <info gid="1694554971" type="Plot Summary">Haruma Ichinose, 17, has been popular since he was born. So popular, in fact, that he figured no one could even come close until he met Yuki Mogami. She's tall, cool, collected, and totally makes him crazy. He may just be in love but falling for someone even more dashing than himself is hard to swallow.</info>
    <info gid="2542157561" type="Vintage">2019 (serialized on Palcy)</info>
    <info gid="851836011" type="Vintage">2019-10-22 (serialized on Palcy)</info>
    <staff gid="307631293">
      <task>Story & Art</task>
      <person id="206223">Saisou</person>
    </staff>
  </manga>
  <anime id="24224" gid="885535394" type="TV" name="Watanuki-san Chi to" precision="TV" generated-on="2021-04-06T00:21:21Z">
  ...
  </anime>
  ...

进入一个 pandas 数据框,其中每个动漫的 ID、名称和情节摘要(如果有的话)作为列。我已经能够使用此代码获取带有动漫 ID 和名称的数据框,但无法获取情节摘要:

import requests
import pandas as pd
import xml.etree.ElementTree as ET

response = requests.get('https://cdn.animenewsnetwork.com/encyclopedia/api.xml?title=24235/24233/24232/24231/24230/24229/24227/24225/24224/24223/24222/24220/24218/24217/24216/24215/24214/24213/24212/24211/24210/24209/24208/24207/24206/24205/24204/24203/24202/24201/24200/24199/24198/24196/24195/24194/24193/24192/24191/24189/24187/24186/24185/24183/24182/24180/24179/24178/24177/24176/')
root = ET.fromstring(response.text)

dfcols = ['id', 'name']
anime_df = pd.DataFrame(columns=dfcols)
for i in root.iter(tag='anime'):
    anime_df = anime_df.append(
        pd.Series([i.get('id'), i.get('name')], index=dfcols),
        ignore_index=True)
anime_df.head()

我也可以用这段代码获取现有的情节摘要:

plot_list = root.findall('.//info[@type="Plot Summary"]')

for i in range(len(plot_list)):
    print(plot_list[i].text)

但是,由于我使用的是 findall,因此无法 link 将情节摘要与其对应的 ids/names 联系起来。有什么想法吗?

我建议您将所有数据拉入字典,并在数据框中完成最后的工作。比单独创建系列并附加更有效。

我在下面提出的解决方案将 idname 分别放入字典(defaultdict),同时将 plot summary 拉入不同的字典(mapping) .

之后,您可以转换为pandas数据结构并合并。

from collections import defaultdict
data = defaultdict(list)
mapping = {}

In [142]: for entry in root:
     ...:     data['id'].append(entry.attrib['id'])
     ...:     data['name'].append(entry.attrib['name'])
     ...:     for ent in entry.findall("./info"):
     ...:         if ent.attrib['type'] == "Plot Summary":
     ...:             mapping[entry.attrib['id']] = ent.text


In [150]: pd.DataFrame(data).merge(pd.Series(mapping, name='plot_summary'), 
                                   left_on='id', 
                                   right_index=True, 
                                   how='left')
Out[150]: 
       id                                               name                                       plot_summary
0   24235                        Love After World Domination                                                NaN
1   24233                          Himitsu Kessha Yaruminati                                                NaN
2   24232                          Enman Kaiketsu! Enma-chan                                                NaN
3   24231                          Zenryoku Kaihi Flag-chan!                                                NaN
4   24230                               Konketsu no Karekore                                                NaN
5   24229                                      Teikō Penguin                                                NaN
6   24227                                      Black Channel                                                NaN
7   24225                                    She's My Knight  Haruma Ichinose, 17, has been popular since he...
8   24224                                Watanuki-san Chi to                                                NaN
9   24223                                Watanuki-san Chi no                                                NaN
10  24222                                    Tiger & Bunny 2                                                NaN
11  24220                                          Super Cub                                                NaN
12  24218                                           FUUTO PI                                                NaN
13  24217                                        Fūto Tantei                                                NaN
14  24216                                       Inō no Aicis                                                NaN
15  24215                     Gyakuten Sekai no Denchi Shōjo                                                NaN
16  24214                                     Eiga Yurukyan△                                                NaN
17  24213                            Re:cycle of Penguindrum                                                NaN
18  24212            That Time I Got Reincarnated as a Slime                                                NaN
19  24211                                Wonder Egg Priority                                                NaN
20  24210                                 Dosukoi Sushi-Zumō                                                NaN
21  24209          Motto! Majime ni Fumajime Kaiketsu Zorori                                                NaN
22  24208                                     Pui Pui Molcar                                                NaN
23  24207                              Case Study of Vanitas                                                NaN
24  24206                                              HOME!                                                NaN
25  24205                         Hachimitsu Suicide Machine                                                NaN
26  24204  Deliver Police: Nishitokyo-shi Deliver Keisats...                                                NaN
27  24203                               Ryūsatsu no Kyōkotsu                                                NaN
28  24202                          Muteking the Dancing Hero                                                NaN
29  24201                                      World Trigger                                                NaN
30  24200  Gekijō-ban Utano☆Princesama♪ Maji Love ST☆RISH...                                                NaN
31  24199  My Hero Academia THE MOVIE: World Heroes' Mission                                                NaN
32  24198                            Vampire Dies in No Time                                                NaN
33  24196                                      Visual Prison                                                NaN
34  24195                               IDOLiSH7 Third Beat!  Kujo starts carrying out his plans to defame G...
35  24194                                   Jujutsu Kaisen 0                                                NaN
36  24193                        Gekijō-ban Jujutsu Kaisen 0                                                NaN
37  24192                                           takt op.                                                NaN
38  24191        She Professed Herself Pupil of the Wise Man                                                NaN
39  24189                             Akebi's Sailor Uniform                                                NaN
40  24187                                     Love and Heart  Sure, university freshman Yagisawa has a lot o...
41  24186                                   Do It Yourself!!                                                NaN
42  24185                                   Ningen Kaishūsha                                                NaN
43  24183                           Kanashiki Debu Neko-chan                                                NaN
44  24182                    Ikinuke! Bakusō! Kusohamu-chan!                                                NaN
45  24180                                        Kaiju No. 8                                                NaN
46  24179                                       Phantom Seer                                                NaN
47  24178                      Magu-chan: God of Destruction  The God of Destruction Magu Menueku has been s...
48  24177                                           i tell c                                                NaN
49  24176                 High School Family: Kokosei Kazoku                                                NaN