根据子元素值删除父节点
remove parent node depending on the child element values
我有很多 XML 个文件,例如下面的示例输入文件。
我想得到的是去掉子元素节点中不包含banana值的<b>
个节点
<a>
<header>
fruit
</header>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
<fruitlist>
<d>apple</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>lemon</d>
</fruitlist>
<fruitlist>
<d>tomato</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>lemon</d>
</fruitlist>
<fruitlist>
<d>kiwi</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>strawberry</d>
</fruitlist>
</b>
</a>
这就是我想要的:
<a>
<header>
fruit
</header>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
<fruitlist>
<d>apple</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
</b>
</a>
我的代码是这样的:
def removebanana(diretories):
xmlFiles = diretories + "/*.xml"
dirloc = directories + "/result"
for fname in glob.glob(xmlFiles):
name = os.path.basename(fname)
content = open(fname, "rt", encoding="utf-8", errors="ignore")
root = tree.getroot()
for b in root.findall("b"):
dlist = []
for b.find("d") is not None:
d = str(drug.find("d").text)
dlist.append(d)
for dd in dlist:
dd = dd.strip()
if dd.lower() == "banana":
cnt += 1
if cnt == 0:
root.remove(b)
num += 0
filename = f"{dirloc}/{name}"
cnt += 1
tree.write(filename)
但是,结果与示例输入文件相同。
如果我没理解错的话,这就是你需要做的:
fruits = """[your code above]"""
import xml.etree.ElementTree as ET
tree = ET.fromstring(fruits)
targets = tree.findall('.//b')
for target in targets:
f_list= [t.text for t in target.findall('.//d')]
if not "banana" in f_list:
tree.remove(target)
print(ET.tostring(tree).decode())
#to write to file:
tree = ET.ElementTree(tree)
tree.write("test.xml", encoding="utf-8")
输出:
<a>
<header>
fruit
</header>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
<fruitlist>
<d>apple</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
</b>
</a>
我有很多 XML 个文件,例如下面的示例输入文件。
我想得到的是去掉子元素节点中不包含banana值的<b>
个节点
<a>
<header>
fruit
</header>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
<fruitlist>
<d>apple</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>lemon</d>
</fruitlist>
<fruitlist>
<d>tomato</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>lemon</d>
</fruitlist>
<fruitlist>
<d>kiwi</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>strawberry</d>
</fruitlist>
</b>
</a>
这就是我想要的:
<a>
<header>
fruit
</header>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
<fruitlist>
<d>apple</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
</b>
</a>
我的代码是这样的:
def removebanana(diretories):
xmlFiles = diretories + "/*.xml"
dirloc = directories + "/result"
for fname in glob.glob(xmlFiles):
name = os.path.basename(fname)
content = open(fname, "rt", encoding="utf-8", errors="ignore")
root = tree.getroot()
for b in root.findall("b"):
dlist = []
for b.find("d") is not None:
d = str(drug.find("d").text)
dlist.append(d)
for dd in dlist:
dd = dd.strip()
if dd.lower() == "banana":
cnt += 1
if cnt == 0:
root.remove(b)
num += 0
filename = f"{dirloc}/{name}"
cnt += 1
tree.write(filename)
但是,结果与示例输入文件相同。
如果我没理解错的话,这就是你需要做的:
fruits = """[your code above]"""
import xml.etree.ElementTree as ET
tree = ET.fromstring(fruits)
targets = tree.findall('.//b')
for target in targets:
f_list= [t.text for t in target.findall('.//d')]
if not "banana" in f_list:
tree.remove(target)
print(ET.tostring(tree).decode())
#to write to file:
tree = ET.ElementTree(tree)
tree.write("test.xml", encoding="utf-8")
输出:
<a>
<header>
fruit
</header>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
<fruitlist>
<d>apple</d>
</fruitlist>
</b>
<b>
<fruitlist>
<d>banana</d>
</fruitlist>
</b>
</a>