在 XML - Python 中查找和替换 CDATA 属性值

Question

我正在尝试演示 finding/replacing XML 属性的功能，类似于相关问题 () 中提出的功能，但针对 CDATA 字符串中包含的内容。具体来说，我想知道是否可以通过索引查找 CDATA 属性值并将其替换为新值。我正在尝试替换第一组 'td' 子元素中的第一个和第二个属性值，以及第二个 'td' 子元素集的第二个和第三个属性值。下面是 XML，以及我正在使用的脚本和要添加到所需输出的新值 XML:

XML ("foo_bar_CDATA.xml"):

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <tr id="source">
                        <td class="raster">Source</td>
                        <td class="number">1800</td>
                        <td class="number">2100</td>
                    </tr>
                    <tr id="preview">
                        <td class="raster">Preview</td>
                        <td class="number">1100</td>
                        <td class="number">1500</td>
                    </tr>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

脚本：

import lxml.etree as ET
xml = ET.parse("C:\Users\mdl518\Desktop\bar_foo_CDATA.xml")
tree=xml.getroot().getchildren()[0][1]

val_1 = 1900
val_2 = 2000
val_3 = 3000
val_4 = 4000

# Find and replace the "td" subelement attribute values with the new values (val_"x") 
for elem in tree.getiterator():
    if elem.text:
        elem.text=elem.text.replace('Source',val_1)
    if elem.text:
        elem.text=elem.text.replace('1800',val_2)
    if elem.text:
        elem.text=elem.text.replace('1100',val_3)
    if elem.text:
        elem.text=elem.text.replace('1500',val_4)
    print(elem.text)

    output = ET.tostring(tree, 
                 encoding="UTF-8",
                 method="xml", 
                 xml_declaration=True, 
                 pretty_print=True)

    print(output.decode("utf-8"))

期望的输出XML：

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <tr id="source">
                        <td class="raster">1900</td>
                        <td class="number">2000</td>
                        <td class="number">2100</td>
                    </tr>
                    <tr id="preview">
                        <td class="raster">Preview</td>
                        <td class="number">3000</td>
                        <td class="number">4000</td>
                    </tr>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

我的主要问题是正确地 indexing/reading 属性与硬编码所需的值，因为使用新值将它们正确地索引到 find/replace 将是理想的。上述方法对于没有 CDATA 字符串的 XML 似乎可行，但我无法确定如何正确解析 CDATA 内容，包括将 XML 正确写入文件。此外，开始和结束标记（<、>）在 XML 中被错误地写为 > 和 <。非常感谢任何帮助！

Answer 1

由于 CDATA 是一个 HTML 字符串，我会将其从 XML 中提取出来，对其进行更改，然后将其重新插入 xml:

#first edit
cd = etree.fromstring(doc.xpath('//*[local-name()="description"]')[0].text) #out of the XML

vals = ["1900","2000","3000","4000"]
rems = ["Source","1800","1100","1500"]
targets = cd.xpath('//tr//td')
for target in targets:
    if target.text in rems:
        target.text=vals[rems.index(target.text)]
#second edit
doc.xpath('//*[local-name()="description"]')[0].text = etree.CDATA(etree.tostring(cd)) #... and back into the XML as CDATA
    
print(ET.tostring(tree).decode())

输出应该是您预期的输出。

在 XML - Python 中查找和替换 CDATA 属性值

Find and Replace CDATA Attribute Values in XML - Python

python

xml

parsing

lxml

cdata