如何使用 python pandas 数据框在 XML 文件中更新和添加属性
How to update and add attributes in a XML file by using python pandas data frame
我是 XML 的新手,有什么有效的方法可以使用 pandas 数据框匹配文本并更新 XML 文件吗?
这是我的大型 XML 文件的一小部分,它仍然遵循适当的格式。
XML 文件 (input.xml):
<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
</mwan>
</sec>
#file continue....
</design>
</brand>
数据框(用作输入):
name Volum_5mb Volum_40mb Volum_70mb
1 M_20_K40745170 89.00 44.00 77.00
2 M_20_K40745171 77.00 65.00 94.00
我想匹配 name
列中的元素,如果匹配,则使用列的其余部分创建新属性,如下所示。例如,如果 df['name']
中的元素 (M_20_K40745170
) 是 present/matched,则在输出文件中分别使用以下行更新相应的节点。
<per fre="Volum_5mb" value="89.00"/>
<per fre="Volum_40mb" value="44.00"/>
<per fre="Volum_70mb" value="77.00"/>
等等。
我希望输出文件看起来像
期望XML (output.xml):
<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<per fre="Volum_5mb" value="89.00" />
#new attribute FYI
<per fre="Volum_40mb" value="44.00" />
#new attribute FYI
<per fre="Volum_70mb" value="77.00" />
#new attribute FYI
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
<per fre="Volum_5mb" value="77.00" />
#new attribute FYI
<per fre="Volum_40mb" value="65.00" />
#new attribute FYI
<per fre="Volum_70mb" value="94.00" />
#new attribute FYI
</mwan>
</sec>
#file continue....
</design>
</brand>
我正在尝试 etree.ElementTree 模块
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
for i in range(len(df)):
for node in tree.findall("./design/sec"):
name = node.attrib.get('name')
if name == df.loc[i, 'name']:
print(name)
我是这个 Python-XML 编码的新手。我不知道如何使用 pandas 数据框在 XML 文件中添加新属性。
请帮忙。
谢谢和问候。
你可以学习 xml
和 xpath
因为主要问题与 pandas
无关,而是 xml
.
您可以使用更复杂的 xpath
来查找名称为 M_20_K40745170
的节点和子节点 mwam
,您必须在其中搜索 pre
并更新它(甚至添加新的 pre
)
node = root.find('./design/sec[@name="M_20_K40745170"]//mwan')
您可以为此使用 df.iterrows()
for index, row in df.iterrows():
node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))
稍后您可以使用 "Volum_5mb"
搜索 per
item = node.find('./per[@fre="Volum_5mb"]')
并创建新的 and/or 更新值
if not item: # if item is None:
item = ET.SubElement(node, 'per')
item.set('fre', "Volum_5mb")
item.set('value', str(row['Volum_5mb']))
您可以为此使用列表 ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']
for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:
item = node.find('./per[@fre="{}"]'.format(fre))
#print(fre, item)
if not item:
item = ET.SubElement(node, 'per')
item.set('fre', fre)
item.set('value', str(row[fre]))
直接在代码中包含示例数据的最少工作代码,但您应该从文件中读取它们。
text = ''' name Volum_5mb Volum_40mb Volum_70mb
1 M_20_K40745170 89.00 44.00 77.00
2 M_20_K40745171 77.00 65.00 94.00'''
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
</mwan>
</sec>
</design>
</brand>'''
import pandas as pd
import io
import xml.etree.ElementTree as ET
#df = pd.read_csv('input.csv')
df = pd.read_csv(io.StringIO(text), sep='\s+')
#print(df)
#tree = ET.('input.xml')
#root = tree.getroot()
root = ET.fromstring(xml)
tree = ET.ElementTree(root)
for index, row in df.iterrows():
node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))
for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:
item = node.find('./per[@fre="{}"]'.format(fre))
#print('item:', fre, '=', item)
if not item:
#print('new', item, fre)
item = ET.SubElement(node, 'per')
#item.tail = '\n ' # to pretty print
item.set('fre', fre)
item.set('value', str(row[fre]))
#print(ET.tostring(node).decode())
#---
print( ET.tostring(root) )
#tree.write('output.xml')
我是 XML 的新手,有什么有效的方法可以使用 pandas 数据框匹配文本并更新 XML 文件吗?
这是我的大型 XML 文件的一小部分,它仍然遵循适当的格式。
XML 文件 (input.xml):
<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
</mwan>
</sec>
#file continue....
</design>
</brand>
数据框(用作输入):
name Volum_5mb Volum_40mb Volum_70mb
1 M_20_K40745170 89.00 44.00 77.00
2 M_20_K40745171 77.00 65.00 94.00
我想匹配 name
列中的元素,如果匹配,则使用列的其余部分创建新属性,如下所示。例如,如果 df['name']
中的元素 (M_20_K40745170
) 是 present/matched,则在输出文件中分别使用以下行更新相应的节点。
<per fre="Volum_5mb" value="89.00"/>
<per fre="Volum_40mb" value="44.00"/>
<per fre="Volum_70mb" value="77.00"/>
等等。
我希望输出文件看起来像
期望XML (output.xml):
<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<per fre="Volum_5mb" value="89.00" />
#new attribute FYI
<per fre="Volum_40mb" value="44.00" />
#new attribute FYI
<per fre="Volum_70mb" value="77.00" />
#new attribute FYI
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
<per fre="Volum_5mb" value="77.00" />
#new attribute FYI
<per fre="Volum_40mb" value="65.00" />
#new attribute FYI
<per fre="Volum_70mb" value="94.00" />
#new attribute FYI
</mwan>
</sec>
#file continue....
</design>
</brand>
我正在尝试 etree.ElementTree 模块
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
for i in range(len(df)):
for node in tree.findall("./design/sec"):
name = node.attrib.get('name')
if name == df.loc[i, 'name']:
print(name)
我是这个 Python-XML 编码的新手。我不知道如何使用 pandas 数据框在 XML 文件中添加新属性。 请帮忙。 谢谢和问候。
你可以学习 xml
和 xpath
因为主要问题与 pandas
无关,而是 xml
.
您可以使用更复杂的 xpath
来查找名称为 M_20_K40745170
的节点和子节点 mwam
,您必须在其中搜索 pre
并更新它(甚至添加新的 pre
)
node = root.find('./design/sec[@name="M_20_K40745170"]//mwan')
您可以为此使用 df.iterrows()
for index, row in df.iterrows():
node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))
稍后您可以使用 "Volum_5mb"
per
item = node.find('./per[@fre="Volum_5mb"]')
并创建新的 and/or 更新值
if not item: # if item is None:
item = ET.SubElement(node, 'per')
item.set('fre', "Volum_5mb")
item.set('value', str(row['Volum_5mb']))
您可以为此使用列表 ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']
for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:
item = node.find('./per[@fre="{}"]'.format(fre))
#print(fre, item)
if not item:
item = ET.SubElement(node, 'per')
item.set('fre', fre)
item.set('value', str(row[fre]))
直接在代码中包含示例数据的最少工作代码,但您应该从文件中读取它们。
text = ''' name Volum_5mb Volum_40mb Volum_70mb
1 M_20_K40745170 89.00 44.00 77.00
2 M_20_K40745171 77.00 65.00 94.00'''
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
</mwan>
</sec>
</design>
</brand>'''
import pandas as pd
import io
import xml.etree.ElementTree as ET
#df = pd.read_csv('input.csv')
df = pd.read_csv(io.StringIO(text), sep='\s+')
#print(df)
#tree = ET.('input.xml')
#root = tree.getroot()
root = ET.fromstring(xml)
tree = ET.ElementTree(root)
for index, row in df.iterrows():
node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))
for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:
item = node.find('./per[@fre="{}"]'.format(fre))
#print('item:', fre, '=', item)
if not item:
#print('new', item, fre)
item = ET.SubElement(node, 'per')
#item.tail = '\n ' # to pretty print
item.set('fre', fre)
item.set('value', str(row[fre]))
#print(ET.tostring(node).decode())
#---
print( ET.tostring(root) )
#tree.write('output.xml')