删除空 XML 元素 - Python
Remove Empty XML Elements - Python
我正在尝试从 XML 中删除空的 XML 元素,但遇到具有属性但没有文本值的元素的问题。我可以成功删除空 XML 元素,但无法保留最终 XML 中具有属性的元素。我想从根本上清理 XML 并完全删除没有文本值的空节点,但保留具有属性的节点。
下面是我正在使用的脚本,以及输入和(期望的)输出 XMLs ....非常感谢任何帮助!
脚本:
from lxml import etree
import os
path = "C:\users\mdl518\Desktop\"
### Removing empty XML elements
tree = etree.parse(os.path.join(path,"my_file.xml"))
for elem in tree.xpath('//*[not(node())]'):
elem.getparent().remove(elem):
with open(".//new_file.xml","wb") as f:
f.write(etree.tostring(tree, xml_declaration=True, encoding='utf-8')) ## Removes empty XML elements, including the elements with attributes
输入XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
<mnl:address>
<mnl:defaultLocale>
</mnl:defaultLocale>
</mnl:address>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
<mnl:age>
</mnl:age>
<mnl:status>
</mnl:status>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
输出XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
xml 是您的问题格式不正确,但假设问题已解决,请尝试更改此行
for elem in tree.xpath('//*[not(node())]'):
对此:
for elem in tree.xpath('//*[not(node())][not(count(./@*))>0]'):
看看它是否有效。
编辑:
问题中编辑后的 XML 仍然格式不正确。我尝试修复它,然后应用了以下内容:
xml_str = """<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0"
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0"
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0"
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
<mnl:address>
<mnl:defaultLocale>
</mnl:defaultLocale>
</mnl:address>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
<mnl:age>
</mnl:age>
<mnl:status>
</mnl:status>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
"""
doc = etree.XML(xml_str.encode())
for elem in doc.xpath('//*[not(count(./@*))>0][not(normalize-space(.))]'):
elem.getparent().remove(elem)
print(etree.tostring(doc, xml_declaration=True, encoding='utf-8').decode())
我从上面得到的输出是问题中想要的输出。
我正在尝试从 XML 中删除空的 XML 元素,但遇到具有属性但没有文本值的元素的问题。我可以成功删除空 XML 元素,但无法保留最终 XML 中具有属性的元素。我想从根本上清理 XML 并完全删除没有文本值的空节点,但保留具有属性的节点。
下面是我正在使用的脚本,以及输入和(期望的)输出 XMLs ....非常感谢任何帮助!
脚本:
from lxml import etree
import os
path = "C:\users\mdl518\Desktop\"
### Removing empty XML elements
tree = etree.parse(os.path.join(path,"my_file.xml"))
for elem in tree.xpath('//*[not(node())]'):
elem.getparent().remove(elem):
with open(".//new_file.xml","wb") as f:
f.write(etree.tostring(tree, xml_declaration=True, encoding='utf-8')) ## Removes empty XML elements, including the elements with attributes
输入XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
<mnl:address>
<mnl:defaultLocale>
</mnl:defaultLocale>
</mnl:address>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
<mnl:age>
</mnl:age>
<mnl:status>
</mnl:status>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
输出XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
xml 是您的问题格式不正确,但假设问题已解决,请尝试更改此行
for elem in tree.xpath('//*[not(node())]'):
对此:
for elem in tree.xpath('//*[not(node())][not(count(./@*))>0]'):
看看它是否有效。
编辑:
问题中编辑后的 XML 仍然格式不正确。我尝试修复它,然后应用了以下内容:
xml_str = """<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0"
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0"
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0"
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
<mnl:address>
<mnl:defaultLocale>
</mnl:defaultLocale>
</mnl:address>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
<mnl:age>
</mnl:age>
<mnl:status>
</mnl:status>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
"""
doc = etree.XML(xml_str.encode())
for elem in doc.xpath('//*[not(count(./@*))>0][not(normalize-space(.))]'):
elem.getparent().remove(elem)
print(etree.tostring(doc, xml_declaration=True, encoding='utf-8').decode())
我从上面得到的输出是问题中想要的输出。