在 XML - Python 中查找和替换 CDATA 属性值
Find and Replace CDATA Attribute Values in XML - Python
我正在尝试演示 finding/replacing XML 属性的功能,类似于相关问题 () 中提出的功能,但针对 CDATA 字符串中包含的内容。具体来说,我想知道是否可以通过索引查找 CDATA 属性值并将其替换为新值。我正在尝试替换第一组 'td' 子元素中的第一个和第二个属性值,以及第二个 'td' 子元素集的第二个和第三个属性值。下面是 XML,以及我正在使用的脚本和要添加到所需输出的新值 XML:
XML ("foo_bar_CDATA.xml"):
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<tr id="source">
<td class="raster">Source</td>
<td class="number">1800</td>
<td class="number">2100</td>
</tr>
<tr id="preview">
<td class="raster">Preview</td>
<td class="number">1100</td>
<td class="number">1500</td>
</tr>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
脚本:
import lxml.etree as ET
xml = ET.parse("C:\Users\mdl518\Desktop\bar_foo_CDATA.xml")
tree=xml.getroot().getchildren()[0][1]
val_1 = 1900
val_2 = 2000
val_3 = 3000
val_4 = 4000
# Find and replace the "td" subelement attribute values with the new values (val_"x")
for elem in tree.getiterator():
if elem.text:
elem.text=elem.text.replace('Source',val_1)
if elem.text:
elem.text=elem.text.replace('1800',val_2)
if elem.text:
elem.text=elem.text.replace('1100',val_3)
if elem.text:
elem.text=elem.text.replace('1500',val_4)
print(elem.text)
output = ET.tostring(tree,
encoding="UTF-8",
method="xml",
xml_declaration=True,
pretty_print=True)
print(output.decode("utf-8"))
期望的输出XML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<tr id="source">
<td class="raster">1900</td>
<td class="number">2000</td>
<td class="number">2100</td>
</tr>
<tr id="preview">
<td class="raster">Preview</td>
<td class="number">3000</td>
<td class="number">4000</td>
</tr>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
我的主要问题是正确地 indexing/reading 属性与硬编码所需的值,因为使用新值将它们正确地索引到 find/replace 将是理想的。上述方法对于没有 CDATA 字符串的 XML 似乎可行,但我无法确定如何正确解析 CDATA 内容,包括将 XML 正确写入文件。此外,开始和结束标记(<、>)在 XML 中被错误地写为 > 和 <。非常感谢任何帮助!
由于 CDATA 是一个 HTML 字符串,我会将其从 XML 中提取出来,对其进行更改,然后将其重新插入 xml:
#first edit
cd = etree.fromstring(doc.xpath('//*[local-name()="description"]')[0].text) #out of the XML
vals = ["1900","2000","3000","4000"]
rems = ["Source","1800","1100","1500"]
targets = cd.xpath('//tr//td')
for target in targets:
if target.text in rems:
target.text=vals[rems.index(target.text)]
#second edit
doc.xpath('//*[local-name()="description"]')[0].text = etree.CDATA(etree.tostring(cd)) #... and back into the XML as CDATA
print(ET.tostring(tree).decode())
输出应该是您预期的输出。
我正在尝试演示 finding/replacing XML 属性的功能,类似于相关问题 (
XML ("foo_bar_CDATA.xml"):
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<tr id="source">
<td class="raster">Source</td>
<td class="number">1800</td>
<td class="number">2100</td>
</tr>
<tr id="preview">
<td class="raster">Preview</td>
<td class="number">1100</td>
<td class="number">1500</td>
</tr>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
脚本:
import lxml.etree as ET
xml = ET.parse("C:\Users\mdl518\Desktop\bar_foo_CDATA.xml")
tree=xml.getroot().getchildren()[0][1]
val_1 = 1900
val_2 = 2000
val_3 = 3000
val_4 = 4000
# Find and replace the "td" subelement attribute values with the new values (val_"x")
for elem in tree.getiterator():
if elem.text:
elem.text=elem.text.replace('Source',val_1)
if elem.text:
elem.text=elem.text.replace('1800',val_2)
if elem.text:
elem.text=elem.text.replace('1100',val_3)
if elem.text:
elem.text=elem.text.replace('1500',val_4)
print(elem.text)
output = ET.tostring(tree,
encoding="UTF-8",
method="xml",
xml_declaration=True,
pretty_print=True)
print(output.decode("utf-8"))
期望的输出XML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
<description>
<![CDATA[
<html>
<head>
<body>
<div id="view">
<div class="item">
<tr id="source">
<td class="raster">1900</td>
<td class="number">2000</td>
<td class="number">2100</td>
</tr>
<tr id="preview">
<td class="raster">Preview</td>
<td class="number">3000</td>
<td class="number">4000</td>
</tr>
</div>
</div>
</body>
</head>
</html>
]]>
</description>
</Overlay></kml>
我的主要问题是正确地 indexing/reading 属性与硬编码所需的值,因为使用新值将它们正确地索引到 find/replace 将是理想的。上述方法对于没有 CDATA 字符串的 XML 似乎可行,但我无法确定如何正确解析 CDATA 内容,包括将 XML 正确写入文件。此外,开始和结束标记(<、>)在 XML 中被错误地写为 > 和 <。非常感谢任何帮助!
由于 CDATA 是一个 HTML 字符串,我会将其从 XML 中提取出来,对其进行更改,然后将其重新插入 xml:
#first edit
cd = etree.fromstring(doc.xpath('//*[local-name()="description"]')[0].text) #out of the XML
vals = ["1900","2000","3000","4000"]
rems = ["Source","1800","1100","1500"]
targets = cd.xpath('//tr//td')
for target in targets:
if target.text in rems:
target.text=vals[rems.index(target.text)]
#second edit
doc.xpath('//*[local-name()="description"]')[0].text = etree.CDATA(etree.tostring(cd)) #... and back into the XML as CDATA
print(ET.tostring(tree).decode())
输出应该是您预期的输出。