如何从 Python 中的 XML 中删除子节点?

How do I remove child node from XML in Python?

我有一个节点列表,我想从 xml 文档中删除这些节点。但是我 运行 在删除元素并将修改后的文档写入新的 xml 文件时遇到了问题。

这是我写的 python 程序 [我正在使用 elementTree]

from xml.etree.ElementTree import ElementTree
    tree = ElementTree()
    tree.parse('autogen_test.xml')
    root = tree.getroot()
    keeper_data = ['4294905264']
    instances = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
    removeList = list()
    for instance in instances:
        #print instance
        data1 = instance.find('./DVAL/DVAL_ID')
        if data1.attrib.get("ID") not in keeper_data:
            removeList.append(instance)
    for tag in removeList:
        parent = tree.findall('./DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
        parent.remove(tag)    
tree.write("out.xml")

我的样本xml如下[这是标准输入,我无法修改]

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DIMENSIONS SYSTEM "dimensions.dtd">
<DIMENSIONS>
   <NUM_DVALS>88816</NUM_DVALS>
   <DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
      <DIMENSION_ID ID="4294905334"/>
      <DIMENSION_NODE>
         <DVAL TYPE="EXACT">
            <DVAL_ID ID="2"/>
            <SYN DISPLAY="TRUE" SEARCH="FALSE" CLASSIFY="FALSE">Brand</SYN>
         </DVAL>
         <DIMENSION_NODE>
            <DVAL TYPE="EXACT">
               <DVAL_ID ID="4294905325"/>
               <SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">hanes</SYN>
            </DVAL>
         </DIMENSION_NODE>
         <DIMENSION_NODE>
            <DVAL TYPE="EXACT">
               <DVAL_ID ID="4294905315"/>
               <SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">lee</SYN>
            </DVAL>
         </DIMENSION_NODE>
         <DIMENSION_NODE>
            <DVAL TYPE="EXACT">
               <DVAL_ID ID="4294905281"/>
               <SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">levi's</SYN>
            </DVAL>
         </DIMENSION_NODE>
         <DIMENSION_NODE>
            <DVAL TYPE="EXACT">
               <DVAL_ID ID="4294905264"/>
               <SYN DISPLAY="TRUE" SEARCH="TRUE" CLASSIFY="TRUE">braun</SYN>
            </DVAL>
         </DIMENSION_NODE>
        </DIMENSION_NODE>
   </DIMENSION>
   </DIMENSIONS>

即使在遍历列表并找到要删除的所有节点之后。 tree.write("out.xml") 总是打印出原来的 xml。基本上我需要从原来的 xml.

中删除标识

预期输出:

<DIMENSIONS>
   <NUM_DVALS>88816</NUM_DVALS>
   <DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
      <DIMENSION_ID ID="4294905334" />
         <DIMENSION_NODE>
            <DVAL TYPE="EXACT">
               <DVAL_ID ID="4294905264" />
               <SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
            </DVAL>
         </DIMENSION_NODE>
        </DIMENSION_NODE>
   </DIMENSION>
   </DIMENSIONS>

要删除的所有 DIMENSION_NODE 共享同一个父 DIMENSION_NODE,因此在遍历 removeList 之前只获取一次会更有效。更重要的是,您想要获取父项 DIMENSION_NODE 而不是子项 DIMENSION_NODE,因此正确的 XPath 是 ./DIMENSION/DIMENSION_NODE。简而言之,尝试使用以下代码更改第二个 for 循环:

parent = tree.find('./DIMENSION/DIMENSION_NODE')
for tag in removeList:
    parent.remove(tag)  

这是演示的完整工作示例(只需要将 source 值替换为实际的 XML):

import xml.etree.ElementTree as ET

source = """replace with the XML in question"""

root = ET.fromstring(source)
keeper_data = ['4294905264']
instances = root.findall('.//DIMENSION/DIMENSION_NODE/DIMENSION_NODE')
removeList = list()
for instance in instances:
    data1 = instance.find('./DVAL/DVAL_ID')
    if data1.attrib.get("ID") not in keeper_data:
        removeList.append(instance)
parent = root.find('.//DIMENSION/DIMENSION_NODE')
for tag in removeList:
    parent.remove(tag)

print(ET.tostring(root))

给定 XML 作为 source 变量的值,输出为:

<DIMENSIONS>
   <NUM_DVALS>88816</NUM_DVALS>
   <DIMENSION NAME="Brand" SRC_FILE="" SRC_TYPE="INTERNAL">
      <DIMENSION_ID ID="4294905334" />
      <DIMENSION_NODE>
         <DVAL TYPE="EXACT">
            <DVAL_ID ID="2" />
            <SYN CLASSIFY="FALSE" DISPLAY="TRUE" SEARCH="FALSE">Brand</SYN>
         </DVAL>
         <DIMENSION_NODE>
            <DVAL TYPE="EXACT">
               <DVAL_ID ID="4294905264" />
               <SYN CLASSIFY="TRUE" DISPLAY="TRUE" SEARCH="TRUE">braun</SYN>
            </DVAL>
         </DIMENSION_NODE>
        </DIMENSION_NODE>
   </DIMENSION>
</DIMENSIONS>