使用 python 替换 xml 标签内容

Question

我有一个包含一些数据的 xml 文件。

<Emp>
<Name>Raja</Name>
<Location>
     <city>ABC</city>
     <geocode>123</geocode>
     <state>XYZ</state> 
</Location>
<sal>100</sal>
<type>temp</type> 
</Emp>

所以 xml 文件中的位置信息是错误的，我必须替换它。

我在 python.

中构建了具有更正值的位置信息

variable = '''
    <Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state> 
    </Location>'''

因此，位置标签应该用新信息替换。在 python.

中是否有任何简单的方法来更新它

我想要最终结果数据，例如

<Emp>
<Name>Raja</Name>
<Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state>
</Location>
<sal>100</sal>
<type>temp</type> 
</Emp>

有什么想法吗？？

谢谢。

Answer 1

更新 - XML 解析器实现：因为替换特定的 <Location> 标签需要修改正则表达式，我提供了一个更通用和更安全的替代方案基于 ElementTree 解析器的实现（如上文所述 @stribizhev 和 @Saket Mittal）。

我必须添加一个根元素 <Emps>（制作一个有效的 xml 文档，需要根元素），我还选择通过 [= 过滤要编辑的位置15=] 标签（但可能是每个字段）：

#!/usr/bin/python
# Alternative Implementation with ElementTree XML Parser

xml = '''\
<Emps>
    <Emp>
        <Name>Raja</Name>
        <Location>
            <city>ABC</city>
            <geocode>123</geocode>
            <state>XYZ</state>
        </Location>
        <sal>100</sal>
        <type>temp</type>
    </Emp>
    <Emp>
        <Name>GsusRecovery</Name>
        <Location>
            <city>Torino</city>
            <geocode>456</geocode>
            <state>UVW</state>
        </Location>
        <sal>120</sal>
        <type>perm</type>
    </Emp>
</Emps>
'''

from xml.etree import ElementTree as ET
# tree = ET.parse('input.xml')  # decomment to parse xml from file
tree = ET.ElementTree(ET.fromstring(xml))
root = tree.getroot()

for location in root.iter('Location'):
    if location.find('city').text == 'Torino':
        location.set("isupdated", "1")
        location.find('city').text = 'MyCity'
        location.find('geocode').text = '10.12'
        location.find('state').text = 'MyState'

print ET.tostring(root, encoding='utf8', method='xml')
# tree.write('output.xml') # decomment if you want to write to file

代码的在线可运行版本here

之前的正则表达式实现

这是使用惰性修饰符 .*? 和点全部 (?s):

的可能实现

#!/usr/bin/python

import re

xml = '''\
<Emp>
<Name>Raja</Name>
<Location>
     <city>ABC</city>
     <geocode>123</geocode>
     <state>XYZ</state>
</Location>
</Emp>'''

locUpdate = '''\
    <Location isupdated=1>
         <city>MyCity</city>
         <geocode>10.12</geocode>
         <state>MyState</state>
    </Location>'''

output = re.sub(r"(?s)<Location>.*?</Location>", r"%s" % locUpdate, xml)

print output

您可以在线测试代码here

警告：如果 xml 输入中有多个 <Location> 标签，则正则表达式将它们全部替换为 locUpdate。您必须使用：

# (note the last ``1`` at the end to limit the substitution only to the first occurrence)
output = re.sub(r"(?s)<Location>.*?</Location>", r"%s" % locUpdate, xml, 1)

使用 python 替换 xml 标签内容

Replace xml tag contents using python

python

regex

xml

lxml