删除多个同名节点

Delete multiple nodes of the same name

假设我有一个 XML 树如下:

my_data.xml
<?xml version="1.0" encoding="UTF-8"?>
<data>    
  <country name="Singapore" xmlns="aaa:bbb:ccc:singapore:eee">
    <continent>Asia</continent>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country name="Panama" xmlns="aaa:bbb:ccc:panama:eee">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
  <ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
    <maylay>
      <holidays>ramadan</holidays>
      <holidays>eid al fitri</holidays>
    </malay>
  </ethnicity>
</data>

解析树 lxml:

import lxml.etree as etree

xtree = etree.parse('my_data.xml')
xroot = xtree.getroot()
malay_node = xroot.xpath('.//*[local-name()="malay"]')[0]
malay_holiday_nodes = xroot.xpath('.//*[local-name()="holidays"]')

我想同时删除节点 malay 下的所有 holidays 个节点。请注意 malay_holiday_nodeslist。如果我这样做:

malay_node.remove(malay_holiday_nodes)

我收到这个错误:

TypeError: Argument 'element' has incorrect type (expected lxml.etree._Element, got list)

有什么简单的方法可以在没有 for 循环的情况下像这样删除整个子节点列表?谢谢。

考虑 XSLT 设计用于转换 XML 文件的专用语言。具体来说,身份模板和空 malay 模板可以删除所有需要的节点,而无需单个 for 循环。 Python 的 lxml 库可以 运行 XSLT 1.0 脚本。

XSLT (另存为.xsl文件,一个特殊的XML文件)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="aaa:bbb:ccc:ethnicity:eee">
    <xsl:output method="xml" encoding="utf-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- EMPTY TEMPLATE TO REMOVE CONTENT -->
    <xsl:template match="doc:malay/*"/>
</xsl:stylesheet>

Online Demo

Python

import lxml.etree as lx

# PARSE XML AND XSLT
doc = lx.parse("Input.xml")
style = lx.parse("Style.xsl")

# CONFIGURE AND RUN TRANSFORMER
transformer = lx.XSLT(style)
result = transformer(doc)

# OUTPUT TO FILE
with open("Output.xml", "wb") as f:
    f.write(result)

使用 XSLT(我是它的忠实粉丝和用户)的替代方法是使用 lxml 的 strip_elements()...

from lxml import etree

tree = etree.parse("my_data.xml")

etree.strip_elements(tree, "{*}holidays", with_tail=True)

tree.write("output.xml")

输出(“output.xml”)使用您的示例 XML 并修复了 maylay/malay 标签不匹配...

<data>
  <country xmlns="aaa:bbb:ccc:singapore:eee" name="Singapore">
    <continent>Asia</continent>
    <rank updated="yes">5</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
  </country>
  <country xmlns="aaa:bbb:ccc:panama:eee" name="Panama">
    <rank updated="yes">69</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
  </country>
  <ethnicity xmlns="aaa:bbb:ccc:ethnicity:eee">
    <malay>
      </malay>
  </ethnicity>
</data>