将 xml 转换为数据框
Converting xml into dataframe
我正在尝试将我的 xml-请求(参见下面的示例)转换为 pandas-数据帧,但它没有按应有的方式工作,我不确定为什么。
示例xml-请求
<workingTimes>
<day>
<date>2015-09-21</date>
<dayOfWeek>Mon</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
<day>
<date>2015-09-22</date>
<dayOfWeek>Tue</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
</workingTimes>
代码:
import pandas as pd
from xml.etree import ElementTree as et
...
r = requests.get(api_url, headers=headers)
root = et.fromstring(r.content)
df_cols, rows = ['date', 'dayOfWeek', 'firstName', 'lastName', 'duration', 'costCenter'], []
for child in root:
s_date = child.attrib.get("date")
s_dayOfWeek = child.attrib.get("dayOfWeek")
s_firstName = child.find("firstName").text if child is not None else None
s_lastName = child.find("lastName").text if child is not None else None
s_duration= child.find("duration").duration if child is not None else None
s_costCenter= child.find("costCenter").text if child is not None else None
rows.append({'date': s_date, 'dayOfWeek': s_dayOfWeek, 'firstName': s_firstName, 'lastName':
s_lastName, 'duration': s_duration, 's_costCenter': costCenter})
df_xml = pd.DataFrame(rows, columns=df_cols)
这是纪录片的一部分:
谁能告诉我我做错了什么?
见下文(只是扩展代码以收集更多元素)
import xml.etree.ElementTree as ET
XML = '''<workingTimes>
<day>
<date>2015-09-21</date>
<dayOfWeek>Mon</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
<day>
<date>2015-09-22</date>
<dayOfWeek>Tue</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
</workingTimes>'''
data = []
root = ET.fromstring(XML)
days = root.findall('.//day')
for d in days:
emp_lst = d.findall('employee')
for e in emp_lst:
# TODO collect more data
data.append(
{'day': d.find('date').text, 'first_name': e.find('firstName').text, 'last_name': e.find('lastName').text})
for entry in data:
print(entry)
输出
{'day': '2015-09-21', 'first_name': 'Albert', 'last_name': 'Grimaldi'}
{'day': '2015-09-21', 'first_name': 'Max', 'last_name': 'Mustermann'}
{'day': '2015-09-22', 'first_name': 'Albert', 'last_name': 'Grimaldi'}
{'day': '2015-09-22', 'first_name': 'Max', 'last_name': 'Mustermann'}
我正在尝试将我的 xml-请求(参见下面的示例)转换为 pandas-数据帧,但它没有按应有的方式工作,我不确定为什么。
示例xml-请求
<workingTimes>
<day>
<date>2015-09-21</date>
<dayOfWeek>Mon</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
<day>
<date>2015-09-22</date>
<dayOfWeek>Tue</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
</workingTimes>
代码:
import pandas as pd
from xml.etree import ElementTree as et
...
r = requests.get(api_url, headers=headers)
root = et.fromstring(r.content)
df_cols, rows = ['date', 'dayOfWeek', 'firstName', 'lastName', 'duration', 'costCenter'], []
for child in root:
s_date = child.attrib.get("date")
s_dayOfWeek = child.attrib.get("dayOfWeek")
s_firstName = child.find("firstName").text if child is not None else None
s_lastName = child.find("lastName").text if child is not None else None
s_duration= child.find("duration").duration if child is not None else None
s_costCenter= child.find("costCenter").text if child is not None else None
rows.append({'date': s_date, 'dayOfWeek': s_dayOfWeek, 'firstName': s_firstName, 'lastName':
s_lastName, 'duration': s_duration, 's_costCenter': costCenter})
df_xml = pd.DataFrame(rows, columns=df_cols)
这是纪录片的一部分:
谁能告诉我我做错了什么?
见下文(只是扩展代码以收集更多元素)
import xml.etree.ElementTree as ET
XML = '''<workingTimes>
<day>
<date>2015-09-21</date>
<dayOfWeek>Mon</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
<day>
<date>2015-09-22</date>
<dayOfWeek>Tue</dayOfWeek>
<employee>
<firstName>Albert</firstName>
<lastName>Grimaldi</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
<employee>
<firstName>Max</firstName>
<lastName>Mustermann</lastName>
<login xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
<personnelNumber>12346</personnelNumber>
<duration>00:00:00</duration>
<rest mandatory="00:00:00">00:00:00</rest>
<costCenter>AB-1234</costCenter>
</employee>
</day>
</workingTimes>'''
data = []
root = ET.fromstring(XML)
days = root.findall('.//day')
for d in days:
emp_lst = d.findall('employee')
for e in emp_lst:
# TODO collect more data
data.append(
{'day': d.find('date').text, 'first_name': e.find('firstName').text, 'last_name': e.find('lastName').text})
for entry in data:
print(entry)
输出
{'day': '2015-09-21', 'first_name': 'Albert', 'last_name': 'Grimaldi'}
{'day': '2015-09-21', 'first_name': 'Max', 'last_name': 'Mustermann'}
{'day': '2015-09-22', 'first_name': 'Albert', 'last_name': 'Grimaldi'}
{'day': '2015-09-22', 'first_name': 'Max', 'last_name': 'Mustermann'}