将父元素添加到子元素以具有一个元组(将 XML 转换为字典集合时)
Add parent element to child element to have one tuple (when converting XML to dictionary collection)
我的临时问题涉及以下内容
鉴于此 xml 结构(保存在 xml_FILE 中):
<countriesAndStates>
<countries>
<name>USA</name>
<states>
<active>true</active>
<stateName>Colorado</stateName>
<isoCode>CO</isoCode>
</states>
<states>
<active>false</active>
<stateName>Florida</stateName>
<isoCode>FL</isoCode>
</states>
</countries>
</countriesAndStates>
我通过这个 for 循环遍历状态元素级别,并借助 defaultdict 中的集合将结果保存到列表字典中,如下所示:
from collections import defaultdict
tree = ET.parse(xml_FILE)
root = tree.getroot()
dict_of_list = defaultdict(list)
for key in root.findall("./countriesAndStates/"
"countries/"
"states/"):
dict_of_list[key.tag].append(key.text)
然后我将这个 dict 转换为数据框,我将拥有包含状态元素数据的所有元组,cf.:
df = pd.DataFrame(dict_of_list)
print(df)
这样我得到以下数据帧输出(方案+元组):
active stateName isoCode
0 true Colorado CO
但是,我希望每个州元组都有国家
这样数据框中的每个 tuple/row 将转换为:
name active stateName isoCode
0 USA true Colorado CO
换句话说:对于每个 state/record 我也想要国家名称。
我怎样才能做到这一点?
提前致谢。
像这样
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<countriesAndStates>
<countries>
<name>USA</name>
<states>
<active>true</active>
<stateName>Colorado</stateName>
<isoCode>CO</isoCode>
</states>
<states>
<active>false</active>
<stateName>Florida</stateName>
<isoCode>FL</isoCode>
</states>
</countries>
</countriesAndStates>'''
data = []
root = ET.fromstring(xml)
for country in root.findall('.//countries'):
name = country.find('name').text
for state in country.findall('states'):
data.append({'name':name})
for e in list(state):
data[-1][e.tag] = e.text
df = pd.DataFrame(data)
print(df)
输出
name active stateName isoCode
0 USA true Colorado CO
1 USA false Florida FL
我的临时问题涉及以下内容
鉴于此 xml 结构(保存在 xml_FILE 中):
<countriesAndStates>
<countries>
<name>USA</name>
<states>
<active>true</active>
<stateName>Colorado</stateName>
<isoCode>CO</isoCode>
</states>
<states>
<active>false</active>
<stateName>Florida</stateName>
<isoCode>FL</isoCode>
</states>
</countries>
</countriesAndStates>
我通过这个 for 循环遍历状态元素级别,并借助 defaultdict 中的集合将结果保存到列表字典中,如下所示:
from collections import defaultdict
tree = ET.parse(xml_FILE)
root = tree.getroot()
dict_of_list = defaultdict(list)
for key in root.findall("./countriesAndStates/"
"countries/"
"states/"):
dict_of_list[key.tag].append(key.text)
然后我将这个 dict 转换为数据框,我将拥有包含状态元素数据的所有元组,cf.:
df = pd.DataFrame(dict_of_list)
print(df)
这样我得到以下数据帧输出(方案+元组):
active stateName isoCode
0 true Colorado CO
但是,我希望每个州元组都有国家 这样数据框中的每个 tuple/row 将转换为:
name active stateName isoCode
0 USA true Colorado CO
换句话说:对于每个 state/record 我也想要国家名称。 我怎样才能做到这一点?
提前致谢。
像这样
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<countriesAndStates>
<countries>
<name>USA</name>
<states>
<active>true</active>
<stateName>Colorado</stateName>
<isoCode>CO</isoCode>
</states>
<states>
<active>false</active>
<stateName>Florida</stateName>
<isoCode>FL</isoCode>
</states>
</countries>
</countriesAndStates>'''
data = []
root = ET.fromstring(xml)
for country in root.findall('.//countries'):
name = country.find('name').text
for state in country.findall('states'):
data.append({'name':name})
for e in list(state):
data[-1][e.tag] = e.text
df = pd.DataFrame(data)
print(df)
输出
name active stateName isoCode
0 USA true Colorado CO
1 USA false Florida FL