通过 XML 解析时正在删除记录
Records are being removed whilst parsing through XML
我正在将我的 XML 解析为 Pandas DF,但在解析过程中丢失了记录。并非所有记录都具有所有属性。在这种情况下,我注意到记录(DF 中的行)已从 DF 中删除,而不是被替换为 "None".
有没有办法缓解这种情况?我似乎找不到解决方案。
我将我的代码粘贴在下面作为参考:
import xml.etree.ElementTree as et
import pandas as pd
tree = et.parse('20191125_DMG_PI.xml')
root = tree.getroot()
df_cols = ["status",
"priref",
"full_name",
"achternaam",
"geboorteplaats",
"sterfplaats",
"detail",
"adres",
"zip",
"note",
"gender"]
rows = []
for record in root:
for child in record:
s_priref = ""
s_priref = child.get('priref')
for child in record:
s_name_note = ""
s_name_note = child.get('name.note')
for child in record:
s_surname = ""
s_surname = child.find('surname')
for field in child.findall('Address'):
s_adress = ""
s_address = field.find('address').text if field is not None else None
for field in child.findall('Address'):
s_zip = ""
s_zip = field.find('address.postal_code').text if field is not None else None
for field in child.findall('name'):
s_full_name = ""
s_full_name = field.find('value').text if field is not None else None
for field in child.findall('name.status'):
s_status = ""
s_status = field.find('value').text if field is not None else None
for field in child.findall('level_of_detail'):
s_detail = ""
s_detail = field.tag + ": " + field.find('value').text if field is not None else None
for field in child.findall('gender'):
s_gender = ""
s_gender = field.find('value').text
for field in child.findall('birth.place'):
s_gbp = ""
s_gbp = field.find('value').text if field is not None else None
for field in child.findall('death.place'):
s_pvo = ""
if len(field.findall('death.place')) == 0:
s_pvo = "NaN"
else:
s_pvo = field.find('value').text if field is not None else None
rows.append({"status": s_status,
"priref": s_priref,
"full_name": s_full_name,
"achternaam": s_surname,
"geboorteplaats": s_gbp,
"sterfplaats": s_pvo,
"detail": s_detail,
"adres": s_address,
"zip": s_zip,
"note": s_name_note,
"gender": s_gender
})
out_df = pd.DataFrame(rows, columns=df_cols)
print(out_df)
前三条记录如下:
<recordList><record priref="530000001" creation="2014-06-23T11:36:18" modification="2019-09-13T09:07:12">
<name>
<value lang="">C.I.A.P.</value>
</name>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">INST</value>
<value lang="0">institution</value>
<value lang="1">instelling</value>
<value lang="2">institution</value>
<value lang="3">Institution</value>
<value lang="4">المؤسسة</value>
<value lang="5">istituto</value>
<value lang="6">οργανισμός</value>
</name.type>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<Address>
<address>Lombaardstraat 23</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Hasselt</value>
</address.place>
<address.postal_code>3500</address.postal_code>
<address.type />
</Address>
<level_of_detail>
<value lang="neutral">PARTIAL</value>
<value lang="0">partial</value>
<value lang="1">partieel</value>
<value lang="2">partiel</value>
<value lang="3">partiell</value>
<value lang="5">parziale</value>
</level_of_detail>
<birth.place>
<value lang="">Hasselt</value>
</birth.place>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<name.note>Centrum voor Informatie en Aktueel Prentenkabinet</name.note>
<Place_activity>
<place_activity.institution />
<place_activity.type />
<place_activity>
<value lang="">Hasselt</value>
</place_activity>
<place_activity.notes />
<place_activity.date.end />
<place_activity.date.start />
</Place_activity>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:07:12</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:15:16</edit.time>
</Edit>
</record><record priref="530000003" creation="2014-06-23T11:36:18" modification="2019-09-13T09:02:51">
<name>
<value lang="">Goossens, K.</value>
</name>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">PERSON</value>
<value lang="0">person</value>
<value lang="1">persoon</value>
<value lang="2">personne</value>
<value lang="3">Person</value>
<value lang="4">إسم شخص</value>
<value lang="5">persona</value>
<value lang="6">πρόσωπο</value>
</name.type>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<surname>Goossens</surname>
<Address>
<address>Morckhovelei</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Borgerhout</value>
</address.place>
<address.postal_code />
<address.type />
</Address>
<nationality>
<value lang="">Belgisch</value>
</nationality>
<level_of_detail>
<value lang="neutral">PARTIAL</value>
<value lang="0">partial</value>
<value lang="1">partieel</value>
<value lang="2">partiel</value>
<value lang="3">partiell</value>
<value lang="5">parziale</value>
</level_of_detail>
<forename>K.</forename>
<gender>
<value lang="neutral">FEMALE</value>
<value lang="0">female</value>
<value lang="1">vrouw</value>
<value lang="2">femme</value>
<value lang="3">weiblich</value>
<value lang="5">femmina</value>
<value lang="6">θηλυκό</value>
</gender>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:02:51</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:21:05</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:20:03</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:19:45</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:19:16</edit.time>
</Edit>
</record><record priref="530000004" creation="2014-06-23T11:36:18" modification="2019-07-19T09:55:26">
<name>
<value lang="">De Bruyne, Pieter</value>
</name>
<name.type>
<value lang="neutral">MAKER</value>
<value lang="0">creator</value>
<value lang="1">vervaardiger</value>
<value lang="2">créateur</value>
<value lang="3">Hersteller</value>
<value lang="4">الصانع</value>
<value lang="5">creatore</value>
<value lang="6">δημιουργός</value>
</name.type>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">PERSON</value>
<value lang="0">person</value>
<value lang="1">persoon</value>
<value lang="2">personne</value>
<value lang="3">Person</value>
<value lang="4">إسم شخص</value>
<value lang="5">persona</value>
<value lang="6">πρόσωπο</value>
</name.type>
<name.type>
<value lang="neutral">AUTHOR</value>
<value lang="0">author</value>
<value lang="1">auteur</value>
<value lang="2">auteur</value>
<value lang="3">Verfasser</value>
<value lang="4">المؤلف</value>
<value lang="5">autore</value>
<value lang="6">συντάκτης</value>
</name.type>
<birth.date.start>1931</birth.date.start>
<death.date.start>1987</death.date.start>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<surname>De Bruyne</surname>
<Address>
<address>Stationstraat 16</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Aalst</value>
</address.place>
<address.postal_code>9300</address.postal_code>
<address.type>woning Pieter De Bruyne</address.type>
</Address>
<biography>Pieter De Bruyne is als pionier binnen het postmodern ontwerpen een internationaal geapprecieerde meubelontwerper. Hij wijdde zijn hele leven aan de vernieuwing van het meubilair. De Bruynes werk sluit aan bij de Memphis-stijl, hoewel hij nooit actief deel wilde uitmaken van dergelijke bewegingen. Elk meubel van zijn hand opent nieuwe perspectieven en is stimulans om andere denkrichtingen in te slaan.
Bibliotheek Design museum Gent:
(1) Pieter De Bruyne 1931- 1987. Pionier van het postmoderne. / Christian Kieckens, Eva Storgaard
(2) 25 jaar Pieter De Bruyne. / Christian Norberg-Schulz</biography>
<Source>
<source>http://vocab.getty.edu/page/ulan/</source>
<source.number>500009402</source.number>
</Source>
<Source>
<source>https://www.wikidata.org/wiki/</source>
<source.number>Q14101030</source.number>
</Source>
<death.date.end>1987</death.date.end>
<death.place>
<value lang="">Aalst</value>
</death.place>
<nationality>
<value lang="">Belgisch</value>
</nationality>
<level_of_detail>
<value lang="neutral">FULL</value>
<value lang="0">full</value>
<value lang="1">volledig</value>
<value lang="2">complet</value>
<value lang="3">vollständig</value>
<value lang="5">completo</value>
</level_of_detail>
<forename>Pieter</forename>
<birth.date.end>1931</birth.date.end>
<birth.place>
<value lang="">Aalst</value>
</birth.place>
<gender>
<value lang="neutral">MALE</value>
<value lang="0">male</value>
<value lang="1">man</value>
<value lang="2">homme</value>
<value lang="3">männlich</value>
<value lang="5">maschio</value>
<value lang="6">αρσενικό</value>
</gender>
<occupation>
<value lang="">ontwerper</value>
</occupation>
<Part_of>
<part_of>
<value lang="">Pieter De Bruyne N.V.</value>
</part_of>
<part_of.notes />
<part_of.category />
<part_of.date.end />
<part_of.date.start />
</Part_of>
<Equivalent>
<equivalent_name>
<value lang="">Pieter De Bruyne N.V.</value>
</equivalent_name>
<equivalent_name.category />
</Equivalent>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<school_style>
<value lang="">post-modernisme</value>
</school_style>
<language>
<value lang="">Nederlands</value>
</language>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-07-19</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:55:26</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-07-19</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:55:24</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-07-17</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:24:24</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-06-18</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:54:47</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-06-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:44:02</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-28</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>08:20:09</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-27</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>10:44:41</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>14:24:58</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>14:23:25</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-04-23</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>16:12:25</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>thesau>thesau</edit.source>
<edit.date>2019-04-18</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>15:19:53</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:58:19</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:57:40</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:50:49</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:21:40</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:20:30</edit.time>
</Edit>
通过切换到 XPath 作为定位任何给定节点的方法,您可以大大简化处理 XML 的代码部分。考虑一下:
import xml.etree.ElementTree as et
def node_text(node, default=''):
return node.text if node is not None and node.text is not None else default
tree = et.parse('20191125_DMG_PI.xml')
rows = []
for record in tree.iterfind('./record'):
rows.append({
'status': node_text(record.find('./name.status/value')),
'priref': record.get('priref'),
'full_name': node_text(record.find('./name/value')),
'achternaam': node_text(record.find('./surname')),
'geboorteplaats': node_text(record.find('./birth.place/value')),
'sterfplaats': node_text(record.find('./death.place/value')),
'detail': node_text(record.find('./level_of_detail/value[@lang="neutral"]')),
'adres': node_text(record.find('./Address/address')),
'zip': node_text(record.find('./Address/address.postal_code')),
'note': node_text(record.find('./name.note')),
'gender': node_text(record.find('./gender/value'))
})
print(rows)
顶部的 node_text()
辅助函数处理 "node not found" 情况。如果您更喜欢空字符串,则可以使用 None
作为默认值,或者为每个值传递单独的默认值。
ElementTree 中的 XPath 必须以 ./
开头,并且仅限于 XPath 1.0 可以执行的操作的子集,但这对于您的用例来说已经足够了。
之后将 rows
放入数据框应该不再是问题。
我正在将我的 XML 解析为 Pandas DF,但在解析过程中丢失了记录。并非所有记录都具有所有属性。在这种情况下,我注意到记录(DF 中的行)已从 DF 中删除,而不是被替换为 "None".
有没有办法缓解这种情况?我似乎找不到解决方案。
我将我的代码粘贴在下面作为参考:
import xml.etree.ElementTree as et
import pandas as pd
tree = et.parse('20191125_DMG_PI.xml')
root = tree.getroot()
df_cols = ["status",
"priref",
"full_name",
"achternaam",
"geboorteplaats",
"sterfplaats",
"detail",
"adres",
"zip",
"note",
"gender"]
rows = []
for record in root:
for child in record:
s_priref = ""
s_priref = child.get('priref')
for child in record:
s_name_note = ""
s_name_note = child.get('name.note')
for child in record:
s_surname = ""
s_surname = child.find('surname')
for field in child.findall('Address'):
s_adress = ""
s_address = field.find('address').text if field is not None else None
for field in child.findall('Address'):
s_zip = ""
s_zip = field.find('address.postal_code').text if field is not None else None
for field in child.findall('name'):
s_full_name = ""
s_full_name = field.find('value').text if field is not None else None
for field in child.findall('name.status'):
s_status = ""
s_status = field.find('value').text if field is not None else None
for field in child.findall('level_of_detail'):
s_detail = ""
s_detail = field.tag + ": " + field.find('value').text if field is not None else None
for field in child.findall('gender'):
s_gender = ""
s_gender = field.find('value').text
for field in child.findall('birth.place'):
s_gbp = ""
s_gbp = field.find('value').text if field is not None else None
for field in child.findall('death.place'):
s_pvo = ""
if len(field.findall('death.place')) == 0:
s_pvo = "NaN"
else:
s_pvo = field.find('value').text if field is not None else None
rows.append({"status": s_status,
"priref": s_priref,
"full_name": s_full_name,
"achternaam": s_surname,
"geboorteplaats": s_gbp,
"sterfplaats": s_pvo,
"detail": s_detail,
"adres": s_address,
"zip": s_zip,
"note": s_name_note,
"gender": s_gender
})
out_df = pd.DataFrame(rows, columns=df_cols)
print(out_df)
前三条记录如下:
<recordList><record priref="530000001" creation="2014-06-23T11:36:18" modification="2019-09-13T09:07:12">
<name>
<value lang="">C.I.A.P.</value>
</name>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">INST</value>
<value lang="0">institution</value>
<value lang="1">instelling</value>
<value lang="2">institution</value>
<value lang="3">Institution</value>
<value lang="4">المؤسسة</value>
<value lang="5">istituto</value>
<value lang="6">οργανισμός</value>
</name.type>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<Address>
<address>Lombaardstraat 23</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Hasselt</value>
</address.place>
<address.postal_code>3500</address.postal_code>
<address.type />
</Address>
<level_of_detail>
<value lang="neutral">PARTIAL</value>
<value lang="0">partial</value>
<value lang="1">partieel</value>
<value lang="2">partiel</value>
<value lang="3">partiell</value>
<value lang="5">parziale</value>
</level_of_detail>
<birth.place>
<value lang="">Hasselt</value>
</birth.place>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<name.note>Centrum voor Informatie en Aktueel Prentenkabinet</name.note>
<Place_activity>
<place_activity.institution />
<place_activity.type />
<place_activity>
<value lang="">Hasselt</value>
</place_activity>
<place_activity.notes />
<place_activity.date.end />
<place_activity.date.start />
</Place_activity>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:07:12</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:15:16</edit.time>
</Edit>
</record><record priref="530000003" creation="2014-06-23T11:36:18" modification="2019-09-13T09:02:51">
<name>
<value lang="">Goossens, K.</value>
</name>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">PERSON</value>
<value lang="0">person</value>
<value lang="1">persoon</value>
<value lang="2">personne</value>
<value lang="3">Person</value>
<value lang="4">إسم شخص</value>
<value lang="5">persona</value>
<value lang="6">πρόσωπο</value>
</name.type>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<surname>Goossens</surname>
<Address>
<address>Morckhovelei</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Borgerhout</value>
</address.place>
<address.postal_code />
<address.type />
</Address>
<nationality>
<value lang="">Belgisch</value>
</nationality>
<level_of_detail>
<value lang="neutral">PARTIAL</value>
<value lang="0">partial</value>
<value lang="1">partieel</value>
<value lang="2">partiel</value>
<value lang="3">partiell</value>
<value lang="5">parziale</value>
</level_of_detail>
<forename>K.</forename>
<gender>
<value lang="neutral">FEMALE</value>
<value lang="0">female</value>
<value lang="1">vrouw</value>
<value lang="2">femme</value>
<value lang="3">weiblich</value>
<value lang="5">femmina</value>
<value lang="6">θηλυκό</value>
</gender>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:02:51</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:21:05</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:20:03</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:19:45</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-09-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>13:19:16</edit.time>
</Edit>
</record><record priref="530000004" creation="2014-06-23T11:36:18" modification="2019-07-19T09:55:26">
<name>
<value lang="">De Bruyne, Pieter</value>
</name>
<name.type>
<value lang="neutral">MAKER</value>
<value lang="0">creator</value>
<value lang="1">vervaardiger</value>
<value lang="2">créateur</value>
<value lang="3">Hersteller</value>
<value lang="4">الصانع</value>
<value lang="5">creatore</value>
<value lang="6">δημιουργός</value>
</name.type>
<name.type>
<value lang="neutral">ACQUISITIONSOURCE</value>
<value lang="0">acquisition source</value>
<value lang="1">verwervingsbron</value>
<value lang="2">source d'acquisition</value>
<value lang="3">Erwerbungsquelle</value>
<value lang="5">fonte di acquisizione</value>
<value lang="6">πηγή απόκτησης</value>
</name.type>
<name.type>
<value lang="neutral">PERSON</value>
<value lang="0">person</value>
<value lang="1">persoon</value>
<value lang="2">personne</value>
<value lang="3">Person</value>
<value lang="4">إسم شخص</value>
<value lang="5">persona</value>
<value lang="6">πρόσωπο</value>
</name.type>
<name.type>
<value lang="neutral">AUTHOR</value>
<value lang="0">author</value>
<value lang="1">auteur</value>
<value lang="2">auteur</value>
<value lang="3">Verfasser</value>
<value lang="4">المؤلف</value>
<value lang="5">autore</value>
<value lang="6">συντάκτης</value>
</name.type>
<birth.date.start>1931</birth.date.start>
<death.date.start>1987</death.date.start>
<name.status>
<value lang="neutral">1</value>
<value lang="0">approved preferred term</value>
<value lang="1">descriptor</value>
<value lang="2">descripteur</value>
<value lang="3">Deskriptor</value>
<value lang="5">termine preferenziale approvato</value>
</name.status>
<surname>De Bruyne</surname>
<Address>
<address>Stationstraat 16</address>
<address.country>
<value lang="">België</value>
</address.country>
<address.place>
<value lang="">Aalst</value>
</address.place>
<address.postal_code>9300</address.postal_code>
<address.type>woning Pieter De Bruyne</address.type>
</Address>
<biography>Pieter De Bruyne is als pionier binnen het postmodern ontwerpen een internationaal geapprecieerde meubelontwerper. Hij wijdde zijn hele leven aan de vernieuwing van het meubilair. De Bruynes werk sluit aan bij de Memphis-stijl, hoewel hij nooit actief deel wilde uitmaken van dergelijke bewegingen. Elk meubel van zijn hand opent nieuwe perspectieven en is stimulans om andere denkrichtingen in te slaan.
Bibliotheek Design museum Gent:
(1) Pieter De Bruyne 1931- 1987. Pionier van het postmoderne. / Christian Kieckens, Eva Storgaard
(2) 25 jaar Pieter De Bruyne. / Christian Norberg-Schulz</biography>
<Source>
<source>http://vocab.getty.edu/page/ulan/</source>
<source.number>500009402</source.number>
</Source>
<Source>
<source>https://www.wikidata.org/wiki/</source>
<source.number>Q14101030</source.number>
</Source>
<death.date.end>1987</death.date.end>
<death.place>
<value lang="">Aalst</value>
</death.place>
<nationality>
<value lang="">Belgisch</value>
</nationality>
<level_of_detail>
<value lang="neutral">FULL</value>
<value lang="0">full</value>
<value lang="1">volledig</value>
<value lang="2">complet</value>
<value lang="3">vollständig</value>
<value lang="5">completo</value>
</level_of_detail>
<forename>Pieter</forename>
<birth.date.end>1931</birth.date.end>
<birth.place>
<value lang="">Aalst</value>
</birth.place>
<gender>
<value lang="neutral">MALE</value>
<value lang="0">male</value>
<value lang="1">man</value>
<value lang="2">homme</value>
<value lang="3">männlich</value>
<value lang="5">maschio</value>
<value lang="6">αρσενικό</value>
</gender>
<occupation>
<value lang="">ontwerper</value>
</occupation>
<Part_of>
<part_of>
<value lang="">Pieter De Bruyne N.V.</value>
</part_of>
<part_of.notes />
<part_of.category />
<part_of.date.end />
<part_of.date.start />
</Part_of>
<Equivalent>
<equivalent_name>
<value lang="">Pieter De Bruyne N.V.</value>
</equivalent_name>
<equivalent_name.category />
</Equivalent>
<id_number>53</id_number>
<supplier.letter.processing>
<value lang="neutral">PRINT</value>
<value lang="0">Print to documents</value>
<value lang="1">Afdrukken naar documenten</value>
<value lang="2">Imprimer en documents</value>
<value lang="3">Ausdruck in Dokumenten</value>
<value lang="5">Stampa nei documenti</value>
</supplier.letter.processing>
<school_style>
<value lang="">post-modernisme</value>
</school_style>
<language>
<value lang="">Nederlands</value>
</language>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-07-19</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:55:26</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-07-19</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>09:55:24</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-07-17</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:24:24</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-06-18</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:54:47</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-06-12</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>11:44:02</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-28</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>08:20:09</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-27</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>10:44:41</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>14:24:58</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-05-13</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>14:23:25</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>people>people</edit.source>
<edit.date>2019-04-23</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>16:12:25</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>thesau>thesau</edit.source>
<edit.date>2019-04-18</edit.date>
<edit.name>ovandhuynslager</edit.name>
<edit.time>15:19:53</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:58:19</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:57:40</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:50:49</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:21:40</edit.time>
</Edit>
<Edit>
<edit.notes />
<edit.source>COLLECT>intern</edit.source>
<edit.date>2016-09-26</edit.date>
<edit.name>rgoris</edit.name>
<edit.time>10:20:30</edit.time>
</Edit>
通过切换到 XPath 作为定位任何给定节点的方法,您可以大大简化处理 XML 的代码部分。考虑一下:
import xml.etree.ElementTree as et
def node_text(node, default=''):
return node.text if node is not None and node.text is not None else default
tree = et.parse('20191125_DMG_PI.xml')
rows = []
for record in tree.iterfind('./record'):
rows.append({
'status': node_text(record.find('./name.status/value')),
'priref': record.get('priref'),
'full_name': node_text(record.find('./name/value')),
'achternaam': node_text(record.find('./surname')),
'geboorteplaats': node_text(record.find('./birth.place/value')),
'sterfplaats': node_text(record.find('./death.place/value')),
'detail': node_text(record.find('./level_of_detail/value[@lang="neutral"]')),
'adres': node_text(record.find('./Address/address')),
'zip': node_text(record.find('./Address/address.postal_code')),
'note': node_text(record.find('./name.note')),
'gender': node_text(record.find('./gender/value'))
})
print(rows)
顶部的 node_text()
辅助函数处理 "node not found" 情况。如果您更喜欢空字符串,则可以使用 None
作为默认值,或者为每个值传递单独的默认值。
ElementTree 中的 XPath 必须以 ./
开头,并且仅限于 XPath 1.0 可以执行的操作的子集,但这对于您的用例来说已经足够了。
之后将 rows
放入数据框应该不再是问题。