如何使用 python pandas 数据框在 XML 文件中更新和添加属性

How to update and add attributes in a XML file by using python pandas data frame

我是 XML 的新手,有什么有效的方法可以使用 pandas 数据框匹配文本并更新 XML 文件吗?

这是我的大型 XML 文件的一小部分,它仍然遵循适当的格式。

XML 文件 (input.xml):

<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
   <design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
      <sec name="abcd" sound_freq="abcd" c_ty="pv">
         <feature number="48">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
      </sec>
      <sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
        </mwan>
      </sec>
      <sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
            <xyz abc="trt" id="abc" />
            <per fre="acc" value="abc" />
            <per fre="xyz" value="abc" />
            <per fre="yy" value="abc" />
         </mwan>
      </sec>
      #file continue....
   </design>
</brand>

数据框(用作输入):

                name       Volum_5mb      Volum_40mb     Volum_70mb
1     M_20_K40745170         89.00           44.00         77.00
2     M_20_K40745171         77.00           65.00         94.00

我想匹配 name 列中的元素,如果匹配,则使用列的其余部分创建新属性,如下所示。例如,如果 df['name'] 中的元素 (M_20_K40745170) 是 present/matched,则在输出文件中分别使用以下行更新相应的节点。

<per fre="Volum_5mb" value="89.00"/>
<per fre="Volum_40mb" value="44.00"/>
<per fre="Volum_70mb" value="77.00"/>

等等。

我希望输出文件看起来像

期望XML (output.xml):

<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
   <design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
      <sec name="abcd" sound_freq="abcd" c_ty="pv">
         <feature number="48">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
      </sec>
      <sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
            <per fre="Volum_5mb" value="89.00" />
            #new attribute FYI
            <per fre="Volum_40mb" value="44.00" />
            #new attribute FYI
            <per fre="Volum_70mb" value="77.00" />
            #new attribute FYI
         </mwan>
      </sec>
      <sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
            <xyz abc="trt" id="abc" />
            <per fre="acc" value="abc" />
            <per fre="xyz" value="abc" />
            <per fre="yy" value="abc" />
            <per fre="Volum_5mb" value="77.00" />
            #new attribute FYI
            <per fre="Volum_40mb" value="65.00" />
            #new attribute FYI
            <per fre="Volum_70mb" value="94.00" />
            #new attribute FYI
         </mwan>
      </sec>
      #file continue....
   </design>
</brand>

我正在尝试 etree.ElementTree 模块

 import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
for i in range(len(df)):
    for node in tree.findall("./design/sec"):
        name = node.attrib.get('name')
        if  name == df.loc[i, 'name']:
            print(name)





        

我是这个 Python-XML 编码的新手。我不知道如何使用 pandas 数据框在 XML 文件中添加新属性。 请帮忙。 谢谢和问候。

你可以学习 xmlxpath 因为主要问题与 pandas 无关,而是 xml.

您可以使用更复杂的 xpath 来查找名称为 M_20_K40745170 的节点和子节点 mwam,您必须在其中搜索 pre 并更新它(甚至添加新的 pre)

node = root.find('./design/sec[@name="M_20_K40745170"]//mwan')

您可以为此使用 df.iterrows()

for index, row in df.iterrows():
    node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))

稍后您可以使用 "Volum_5mb"

搜索 per
item = node.find('./per[@fre="Volum_5mb"]')

并创建新的 and/or 更新值

if not item:  # if item is None:
    item = ET.SubElement(node, 'per')
    item.set('fre', "Volum_5mb")

item.set('value', str(row['Volum_5mb']))

您可以为此使用列表 ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']

for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:

    item = node.find('./per[@fre="{}"]'.format(fre))
    #print(fre, item)

    if not item:
        item = ET.SubElement(node, 'per')
        item.set('fre', fre)

    item.set('value', str(row[fre]))

直接在代码中包含示例数据的最少工作代码,但您应该从文件中读取它们。

text = '''                name       Volum_5mb      Volum_40mb     Volum_70mb
1     M_20_K40745170         89.00           44.00         77.00
2     M_20_K40745171         77.00           65.00         94.00'''

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
   <design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
      <sec name="abcd" sound_freq="abcd" c_ty="pv">
         <feature number="48">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
      </sec>
      <sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
         </mwan>
      </sec>
      <sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
            <xyz abc="trt" id="abc" />
            <per fre="acc" value="abc" />
            <per fre="xyz" value="abc" />
            <per fre="yy" value="abc" />
         </mwan>
      </sec>
   </design>
</brand>'''

import pandas as pd
import io
import xml.etree.ElementTree as ET

#df = pd.read_csv('input.csv')
df = pd.read_csv(io.StringIO(text), sep='\s+')
#print(df)

#tree = ET.('input.xml')
#root = tree.getroot()
root = ET.fromstring(xml)
tree = ET.ElementTree(root)

for index, row in df.iterrows():
    node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))
    
    for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:

        item = node.find('./per[@fre="{}"]'.format(fre))
        #print('item:', fre, '=', item)

        if not item:
            #print('new', item, fre)
            item = ET.SubElement(node, 'per')
            #item.tail = '\n         '  # to pretty print
            item.set('fre', fre)

        item.set('value', str(row[fre]))

    #print(ET.tostring(node).decode())
    
#---
    
print( ET.tostring(root) )
#tree.write('output.xml')

文档:Modifying an XML File