根据子元素的条件删除 XML 父元素 - Python
Remove XML Parent Elements Based on Condition of Child Element - Python
我试图根据包含值“nan”的特定子元素的文本删除父 XML 元素。输入 XML 包含名称空间,这使得这比预期的更棘手,我可以单独删除 select 子元素,但不能删除 associated/adjacent 父元素。我只能删除与 gam:String 元素关联的“nan”值,但我想删除所有具有“nan”文本值的子元素及其关联的父元素。
下面是我正在使用的脚本,以及输入和(期望的)输出 XMLs ....非常感谢任何帮助!
脚本:
from lxml import etree
import os
path = "C:\users\mdl518\Desktop\"
### Removing "Nan" Values
tree = etree.parse(os.path.join(path,"metadata_info.xml"))
for elem in tree_2.findall('.//{http://standards.iso.org/iso/19115/-3/gam/1.0}String'):
if elem.text=='nan':
parent = elem.getparent()
parent.remove(elem)
with open(".//metadata_output.xml","wb") as f:
f.write(etree.tostring(tree_2, xml_declaration=True, encoding='utf-8')) ## Removes elements with "nan" values
输入XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:name>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description>
<mcc:listing codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"</mcc:listing>
</mnl:description>
</mnl:name>
<mnl:address>
<mnl:defaultLocale>
<lan:location>nan</lan:location>
</mnl:defaultLocale>
</mnl:address>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
<mnl:age>
<gam:String>nan</gam:String>
</mnl:age>
<mnl:status>
<lis:employment>nan</lis:employment>
</mnl:status>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
输出XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:name>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description>
<mcc:listing codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"</mcc:listing>
</mnl:description>
</mnl:name>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
这必须分两个阶段完成:首先删除所有带有 nan
文本节点的节点,然后遍历第一步创建的空节点并将它们也删除:
#step 1 - remove nan nodes
for n in tree.xpath('//*[.="nan"]'):
n.getparent().remove(n)]
#step 2 - select empty nodes and remove them as well
empty = [e for e in doc.xpath('//*[not(normalize-space())]')]
for emp in empty:
try:
emp.getparent().remove(emp)
#one nested empty node is created by the first step; this step removes both nodes so try/except is necessary:
except:
continue
print(etree.tostring(doc).decode())
这应该会得到您想要的输出。
我试图根据包含值“nan”的特定子元素的文本删除父 XML 元素。输入 XML 包含名称空间,这使得这比预期的更棘手,我可以单独删除 select 子元素,但不能删除 associated/adjacent 父元素。我只能删除与 gam:String 元素关联的“nan”值,但我想删除所有具有“nan”文本值的子元素及其关联的父元素。
下面是我正在使用的脚本,以及输入和(期望的)输出 XMLs ....非常感谢任何帮助!
脚本:
from lxml import etree
import os
path = "C:\users\mdl518\Desktop\"
### Removing "Nan" Values
tree = etree.parse(os.path.join(path,"metadata_info.xml"))
for elem in tree_2.findall('.//{http://standards.iso.org/iso/19115/-3/gam/1.0}String'):
if elem.text=='nan':
parent = elem.getparent()
parent.remove(elem)
with open(".//metadata_output.xml","wb") as f:
f.write(etree.tostring(tree_2, xml_declaration=True, encoding='utf-8')) ## Removes elements with "nan" values
输入XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:name>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description>
<mcc:listing codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"</mcc:listing>
</mnl:description>
</mnl:name>
<mnl:address>
<mnl:defaultLocale>
<lan:location>nan</lan:location>
</mnl:defaultLocale>
</mnl:address>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
<mnl:age>
<gam:String>nan</gam:String>
</mnl:age>
<mnl:status>
<lis:employment>nan</lis:employment>
</mnl:status>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
输出XML:
<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0"
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
<mdl:metadataIdentifier>
<mcc:MD_Identifier>
<mnl:name>
<mnl:type>
<gam:String>The Metadata File</gam:String>
</mnl:type>
<mnl:description>
<mcc:listing codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"</mcc:listing>
</mnl:description>
</mnl:name>
<lan:language>
<lan:type>
<lis:name>English</lis:name>
</lan:type>
</lan:language>
</mcc:MD_Identifier>
<mcc:contactInfo>
<mdl:POC>
<mnl:name>
<lis:person>Tom</lis:person>
</mnl:name>
</mdl:POC>
</mcc:contactInfo>
</mdl:metadataIdentifier>
</nas:metadata>
这必须分两个阶段完成:首先删除所有带有 nan
文本节点的节点,然后遍历第一步创建的空节点并将它们也删除:
#step 1 - remove nan nodes
for n in tree.xpath('//*[.="nan"]'):
n.getparent().remove(n)]
#step 2 - select empty nodes and remove them as well
empty = [e for e in doc.xpath('//*[not(normalize-space())]')]
for emp in empty:
try:
emp.getparent().remove(emp)
#one nested empty node is created by the first step; this step removes both nodes so try/except is necessary:
except:
continue
print(etree.tostring(doc).decode())
这应该会得到您想要的输出。