如何从 xml 中删除列表中不存在的子节点

How to delete a subnode from xml that is not present in a list

我有一个 xml 具有不同数量的节点级别。我想检查树中的每个节点,只有当它及其子节点不在列表中时才将其删除

<node1>
  <xxx>
     stuff
  </xxx>
  <subnode2>
     <yyy>
        stuf2
     </yyy>
  </subnode2>
</node1>

我的问题是,如果 'yyy' 在 dontRemove 列表中,而它的父级不在,那么 yyy 仍然会被清除。

import xml.etree.ElementTree as ET

document = ET.parse("foo.xml")
root = document.getroot()

#list of nodes
toRemove = root.findall('.//')

#list of tags that shouldn't be removed
dontRemove = ['xxx','yyy']

#take element from root and compare it with "dont remove it", if it's present remove from removing list
for element in list(toRemove):
    string = str(element)
    string = string.split(" ")
    string = string[1].replace("'", '')
    print(string)
    removed = 0
    for i in range(len(dontRemove)):
        if dontRemove[i] in string and removed == 0:
            toRemove.remove(element)
            removed = 1
#removing: 
for i in range(len(toRemove)):
    toRemove[i].clear()

您可以检查是否应定期删除元素 - 如果它至少包含一个 "nonremovable" 个子元素,则不应。

dontRemove = ['xxx','yyy']
elements_to_remove = []

def should_not_be_removed(parent):
    if parent.tag in dontRemove:
        return True

    nonremovable_child_found = False
    for child in parent:
        if should_not_be_removed(child):
            nonremovable_child_found = True
    if not nonremovable_child_found:
        elements_to_remove.append(parent)
    return nonremovable_child_found

should_not_be_removed(root)

在这个以根 elements_to_remove 开头的循环调用之后包含一个元素列表,这些元素不包含带有在 dont remove

中指定的标记的子项

我还扩展了您的 xml 以涵盖更多测试用例,请检查这是否是您的意思:

<node1>
    <xxx>
        don't remove
    </xxx>
    <subnode2>
        <yyy>
            don't remove
        </yyy>
    </subnode2>
    <subnode3>
        remove
    </subnode3>
    <subnode4>
        <xxx>
            don't remove
        </xxx>
        <abc>
            remove
        </abc>
    </subnode4>
</node1>