如何从 xml 中删除列表中不存在的子节点
How to delete a subnode from xml that is not present in a list
我有一个 xml 具有不同数量的节点级别。我想检查树中的每个节点,只有当它及其子节点不在列表中时才将其删除
<node1>
<xxx>
stuff
</xxx>
<subnode2>
<yyy>
stuf2
</yyy>
</subnode2>
</node1>
我的问题是,如果 'yyy' 在 dontRemove 列表中,而它的父级不在,那么 yyy 仍然会被清除。
import xml.etree.ElementTree as ET
document = ET.parse("foo.xml")
root = document.getroot()
#list of nodes
toRemove = root.findall('.//')
#list of tags that shouldn't be removed
dontRemove = ['xxx','yyy']
#take element from root and compare it with "dont remove it", if it's present remove from removing list
for element in list(toRemove):
string = str(element)
string = string.split(" ")
string = string[1].replace("'", '')
print(string)
removed = 0
for i in range(len(dontRemove)):
if dontRemove[i] in string and removed == 0:
toRemove.remove(element)
removed = 1
#removing:
for i in range(len(toRemove)):
toRemove[i].clear()
您可以检查是否应定期删除元素 - 如果它至少包含一个 "nonremovable" 个子元素,则不应。
dontRemove = ['xxx','yyy']
elements_to_remove = []
def should_not_be_removed(parent):
if parent.tag in dontRemove:
return True
nonremovable_child_found = False
for child in parent:
if should_not_be_removed(child):
nonremovable_child_found = True
if not nonremovable_child_found:
elements_to_remove.append(parent)
return nonremovable_child_found
should_not_be_removed(root)
在这个以根 elements_to_remove
开头的循环调用之后包含一个元素列表,这些元素不包含带有在 dont remove
中指定的标记的子项
我还扩展了您的 xml 以涵盖更多测试用例,请检查这是否是您的意思:
<node1>
<xxx>
don't remove
</xxx>
<subnode2>
<yyy>
don't remove
</yyy>
</subnode2>
<subnode3>
remove
</subnode3>
<subnode4>
<xxx>
don't remove
</xxx>
<abc>
remove
</abc>
</subnode4>
</node1>
我有一个 xml 具有不同数量的节点级别。我想检查树中的每个节点,只有当它及其子节点不在列表中时才将其删除
<node1>
<xxx>
stuff
</xxx>
<subnode2>
<yyy>
stuf2
</yyy>
</subnode2>
</node1>
我的问题是,如果 'yyy' 在 dontRemove 列表中,而它的父级不在,那么 yyy 仍然会被清除。
import xml.etree.ElementTree as ET
document = ET.parse("foo.xml")
root = document.getroot()
#list of nodes
toRemove = root.findall('.//')
#list of tags that shouldn't be removed
dontRemove = ['xxx','yyy']
#take element from root and compare it with "dont remove it", if it's present remove from removing list
for element in list(toRemove):
string = str(element)
string = string.split(" ")
string = string[1].replace("'", '')
print(string)
removed = 0
for i in range(len(dontRemove)):
if dontRemove[i] in string and removed == 0:
toRemove.remove(element)
removed = 1
#removing:
for i in range(len(toRemove)):
toRemove[i].clear()
您可以检查是否应定期删除元素 - 如果它至少包含一个 "nonremovable" 个子元素,则不应。
dontRemove = ['xxx','yyy']
elements_to_remove = []
def should_not_be_removed(parent):
if parent.tag in dontRemove:
return True
nonremovable_child_found = False
for child in parent:
if should_not_be_removed(child):
nonremovable_child_found = True
if not nonremovable_child_found:
elements_to_remove.append(parent)
return nonremovable_child_found
should_not_be_removed(root)
在这个以根 elements_to_remove
开头的循环调用之后包含一个元素列表,这些元素不包含带有在 dont remove
我还扩展了您的 xml 以涵盖更多测试用例,请检查这是否是您的意思:
<node1>
<xxx>
don't remove
</xxx>
<subnode2>
<yyy>
don't remove
</yyy>
</subnode2>
<subnode3>
remove
</subnode3>
<subnode4>
<xxx>
don't remove
</xxx>
<abc>
remove
</abc>
</subnode4>
</node1>