如何删除 xml 文件中可能为空的特定标签

Question

我正在尝试从 xml 文件中删除特定标签，但前提是它是空的。

文件：

<?xml version="1.0" encoding="utf-8"?>
<parent>
  <child>
    <value1>Foo<value1/>
    <value2>Bar<value2/>
    <value3>Hello World<value3/>
    <value3/>
    <value3/>
    <value3/>
  <child/>
<parent/>

预期输出：

<?xml version="1.0" encoding="utf-8"?>
<parent>
  <child>
    <value1>Foo<value1/>
    <value2>Bar<value2/>
    <value3>Hello World<value3/>
  <child/>
<parent/>

我在读取文件和使用 lxml 解析文件时遇到问题，所以我对任何其他 python3 methods/modules 持开放态度。理想情况下希望代码执行如下操作：

def remove_empty_tag(tag=tagname, file=data):
   ...

data = open("file.xml").read()
new_xml = remove_empty_tag(tag="value3", data)
print(new_xml)

但愿意寻求任何帮助，甚至方向。

Answer 1

from lxml import etree


def remove_empty_tag(tag, original_file, new_file):
    file = open(original_file, 'r', encoding='utf8').read()
    root = etree.fromstring(file)
    for element in root.xpath(".//*[not(node())]"):
        if element.tag == tag:
            element.getparent().remove(element)
    with open(new_file, 'wb') as f:
        f.write(etree.tostring(root, pretty_print=True))


remove_empty_tag("value3", "old.xml", "new.xml")

这就是我试图实现的目标，出于某种原因，如果其中包含 <?xml version="1.0" encoding="utf-8"?>，它会抱怨 file/data。因此，只需将其删除即可修复。不是真正的重复，因为来自另一个线程的答案没有指定如何只删除一个特定的空标签，也没有说明它实际在做什么或如何将它写入一个新文件而没有随机的 '\n' 无处不在......

Answer 2

您不需要 open() 读取或写入文件；使用lxml的parse() to parse the file and write()写新的

您还应该能够使用 self:: xpath 轴而不是 python if 来检查标签名称。

示例...

XML 输入 (old.xml)

<parent>
  <child>
    <value1>Foo</value1>
    <value2>Bar</value2>
    <value3>Hello World</value3>
    <value3/>
    <value3/>
    <value3/>
  </child>
</parent>

Python

from lxml import etree


def remove_empty_tag(tag, original_file, new_file):
    root = etree.parse(original_file)
    for element in root.xpath(f".//*[self::{tag} and not(node())]"):
        element.getparent().remove(element)

    # Serialize "root" and create a new tree using an XMLParser to clean up
    # formatting caused by removing elements.
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.fromstring(etree.tostring(root), parser=parser)
    # Write to new file.
    etree.ElementTree(tree).write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")


remove_empty_tag("value3", "old.xml", "new.xml")

XML 输出 (new.xml)

<?xml version='1.0' encoding='UTF-8'?>
<parent>
  <child>
    <value1>Foo</value1>
    <value2>Bar</value2>
    <value3>Hello World</value3>
  </child>
</parent>

注意：序列化和创建新树并不是绝对必要的。您可以改为这样做：

root.write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")

但输出的格式会略有不同（注意 child 结束标记的额外缩进：

<?xml version='1.0' encoding='UTF-8'?>
<parent>
  <child>
    <value1>Foo</value1>
    <value2>Bar</value2>
    <value3>Hello World</value3>
    </child>
</parent>

如何删除 xml 文件中可能为空的特定标签

How to remove a specific tag that could be empty in an xml file

xml

tags

lxml

is-empty

python-3.x