剥离 lxml 中的单个元素
Strip a single element in lxml
我需要删除一个 XML 元素,同时保留其数据。 lxml 函数 strip_tags
确实删除了元素,但它是递归工作的,我想删除单个元素。
我尝试使用 answer on this post,但 remove
删除了整个元素。
xml="""
<groceries>
One <fruit state="rotten">apple</fruit> a day keeps the doctor away.
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>
"""
tree=ET.fromstring(xml)
for bad in tree.xpath("//fruit[@state='rotten']"):
bad.getparent().remove(bad)
print (ET.tostring(tree, pretty_print=True))
我想得到
<groceries>
One apple a day keeps the doctor away.
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>\n'
我明白了
<groceries>
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>\n'
我尝试使用 strip_tags
:
for bad in tree.xpath("//fruit[@state='rotten']"):
ET.strip_tags(bad.getparent(), bad.tag)
<groceries>
One apple a day keeps the doctor away.
This pear is fresh.
</groceries>
但这会删除所有内容,我只想删除带有 state='rotten'
的元素。
也许其他人有更好的主意,但这是一个可能的解决方法:
bad = tree.xpath(".//fruit[@state='rotten']")[0] #for simplicity, I didn't bother with a for loop in this case
txt = bad.text+bad.tail # collect the text content of bad; strangely enough it's not just 'apple'
bad.getparent().text += txt # add the collected text to the parent's existing text
tree.remove(bad) # this gets rid only of this specific 'bad'
print(etree.tostring(tree).decode())
输出:
<groceries>
One apple a day keeps the doctor away.
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>
我需要删除一个 XML 元素,同时保留其数据。 lxml 函数 strip_tags
确实删除了元素,但它是递归工作的,我想删除单个元素。
我尝试使用 answer on this post,但 remove
删除了整个元素。
xml="""
<groceries>
One <fruit state="rotten">apple</fruit> a day keeps the doctor away.
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>
"""
tree=ET.fromstring(xml)
for bad in tree.xpath("//fruit[@state='rotten']"):
bad.getparent().remove(bad)
print (ET.tostring(tree, pretty_print=True))
我想得到
<groceries>
One apple a day keeps the doctor away.
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>\n'
我明白了
<groceries>
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>\n'
我尝试使用 strip_tags
:
for bad in tree.xpath("//fruit[@state='rotten']"):
ET.strip_tags(bad.getparent(), bad.tag)
<groceries>
One apple a day keeps the doctor away.
This pear is fresh.
</groceries>
但这会删除所有内容,我只想删除带有 state='rotten'
的元素。
也许其他人有更好的主意,但这是一个可能的解决方法:
bad = tree.xpath(".//fruit[@state='rotten']")[0] #for simplicity, I didn't bother with a for loop in this case
txt = bad.text+bad.tail # collect the text content of bad; strangely enough it's not just 'apple'
bad.getparent().text += txt # add the collected text to the parent's existing text
tree.remove(bad) # this gets rid only of this specific 'bad'
print(etree.tostring(tree).decode())
输出:
<groceries>
One apple a day keeps the doctor away.
This <fruit state="fresh">pear</fruit> is fresh.
</groceries>