如何从 Python 中的 XML 中删除子节点?
How do I remove child node from XML in Python?
我有一个节点列表,我想从 xml 文档中删除这些节点。但是我 运行 在删除元素并将修改后的文档写入新的 xml 文件时遇到了问题。
这是我写的 python 程序 [我正在使用 elementTree]
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('autogen_test.xml')
root = tree.getroot()
keeper_data = ['4294905264']
instances = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
#print instance
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
for tag in removeList:
parent = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
parent.remove(tag)
tree.write("out.xml")
我的样本xml如下[这是标准输入,我无法修改]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIMENSIONS SYSTEM "dimensions.dtd">
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334"/>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2"/>
<SYN DISPLAY="TRUE" SEARCH="FALSE" CLASSIFY="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905325"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">hanes</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905315"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">lee</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905281"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">levi's</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
即使在遍历列表并找到要删除的所有节点之后。 tree.write("out.xml") 总是打印出原来的 xml。基本上我需要从原来的 xml.
中删除标识
预期输出:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
要删除的所有 DIMENSION_NODE
共享同一个父 DIMENSION_NODE
,因此在遍历 removeList
之前只获取一次会更有效。更重要的是,您想要获取父项 DIMENSION_NODE
而不是子项 DIMENSION_NODE
,因此正确的 XPath 是 ./DIMENSION/DIMENSION_NODE
。简而言之,尝试使用以下代码更改第二个 for
循环:
parent = tree.find('./DIMENSION/DIMENSION_NODE')
for tag in removeList:
parent.remove(tag)
这是演示的完整工作示例(只需要将 source
值替换为实际的 XML):
import xml.etree.ElementTree as ET
source = """replace with the XML in question"""
root = ET.fromstring(source)
keeper_data = ['4294905264']
instances = root.findall('.//DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
parent = root.find('.//DIMENSION/DIMENSION_NODE')
for tag in removeList:
parent.remove(tag)
print(ET.tostring(root))
给定 XML 作为 source
变量的值,输出为:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2" />
<SYN CLASSIFY="FALSE" DISPLAY="TRUE" SEARCH="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
我有一个节点列表,我想从 xml 文档中删除这些节点。但是我 运行 在删除元素并将修改后的文档写入新的 xml 文件时遇到了问题。
这是我写的 python 程序 [我正在使用 elementTree]
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('autogen_test.xml')
root = tree.getroot()
keeper_data = ['4294905264']
instances = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
#print instance
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
for tag in removeList:
parent = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
parent.remove(tag)
tree.write("out.xml")
我的样本xml如下[这是标准输入,我无法修改]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIMENSIONS SYSTEM "dimensions.dtd">
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334"/>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2"/>
<SYN DISPLAY="TRUE" SEARCH="FALSE" CLASSIFY="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905325"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">hanes</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905315"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">lee</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905281"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">levi's</SYN>
</DVAL>
</DIMENSION_NODE>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264"/>
<SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
即使在遍历列表并找到要删除的所有节点之后。 tree.write("out.xml") 总是打印出原来的 xml。基本上我需要从原来的 xml.
中删除标识预期输出:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>
要删除的所有 DIMENSION_NODE
共享同一个父 DIMENSION_NODE
,因此在遍历 removeList
之前只获取一次会更有效。更重要的是,您想要获取父项 DIMENSION_NODE
而不是子项 DIMENSION_NODE
,因此正确的 XPath 是 ./DIMENSION/DIMENSION_NODE
。简而言之,尝试使用以下代码更改第二个 for
循环:
parent = tree.find('./DIMENSION/DIMENSION_NODE')
for tag in removeList:
parent.remove(tag)
这是演示的完整工作示例(只需要将 source
值替换为实际的 XML):
import xml.etree.ElementTree as ET
source = """replace with the XML in question"""
root = ET.fromstring(source)
keeper_data = ['4294905264']
instances = root.findall('.//DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
data1 = instance.find('./DVAL/DVAL_ID')
if data1.attrib.get("ID") not in keeper_data:
removeList.append(instance)
parent = root.find('.//DIMENSION/DIMENSION_NODE')
for tag in removeList:
parent.remove(tag)
print(ET.tostring(root))
给定 XML 作为 source
变量的值,输出为:
<DIMENSIONS>
<NUM_DVALS>88816</NUM_DVALS>
<DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
<DIMENSION_ID ID="4294905334" />
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="2" />
<SYN CLASSIFY="FALSE" DISPLAY="TRUE" SEARCH="FALSE">Brand</SYN>
</DVAL>
<DIMENSION_NODE>
<DVAL TYPE="EXACT">
<DVAL_ID ID="4294905264" />
<SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
</DVAL>
</DIMENSION_NODE>
</DIMENSION_NODE>
</DIMENSION>
</DIMENSIONS>