如何提取 xml 文件的一部分
How to extract part of xml file
我有一个很大的 xml 文件,如下所示。基本上我想提取 xml 文件的一部分,例如这个 "<ManagedElementId string = "rbs064841"/>"
。
<Model version = "1" importVersion = "12.2">
<Create>
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT78">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04798"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT4">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04456"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
</Create>
</Model>
这意味着在解析之后我想提取这部分:
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
所以通过 ManagedElementId
在大 xml 文件中搜索,当找到时提取它所在的部分,意思是从 <SubNetwork>
到 </SubNetwork>
.
我知道如何从 xml 文件中提取数据,但我不知道如何提取 xml.file 的一部分。我正在使用 python ElementTree。
任何建议都会有所帮助。
使用find
和path
,然后得到相对父节点,像这样:
s = '''<Model version = "1" importVersion = "12.2">
<Create>
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT78">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04798"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT4">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04456"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
</Create>
</Model>'''
# I'd prefer lxml, but you need to work on xml module...
import xml.etree.ElementTree as ET
tree = ET.fromstring(s)
# since the SubNetwork node you're interested is the parent of parent of ManagedElementId
node = tree.find('.//ManagedElementId[@string="rbs064841"]/../../../')
print ET.tostring(node)
<SubNetwork networkType="WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType="CELLO">
<ManagedElementId string="rbs064841"/>
<primaryType type="RBS"/>
<managedElementType types=""/>
<associatedSite string="Site=site06484"/>
<nodeVersion string="W12B"/>
<platformVersion string="Cello 12.2"/>
<swVersion string=""/>
<vendorName string="ERICSSON"/>
<userDefinedState string=""/>
<managedServiceAvailability int="1"/>
<isManaged boolean="true"/>
<neMIMVersion string="vS.1.150"/>
<connectionStatus string="ON"/>
</ManagedElement>
</SubNetwork>
如果您从文件中解析,请使用 getroot()
:
root = ET.parse('file.xml')
tree = root.getroot()
...
希望对您有所帮助。
我有一个很大的 xml 文件,如下所示。基本上我想提取 xml 文件的一部分,例如这个 "<ManagedElementId string = "rbs064841"/>"
。
<Model version = "1" importVersion = "12.2">
<Create>
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT78">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04798"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT4">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04456"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
</Create>
</Model>
这意味着在解析之后我想提取这部分:
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
所以通过 ManagedElementId
在大 xml 文件中搜索,当找到时提取它所在的部分,意思是从 <SubNetwork>
到 </SubNetwork>
.
我知道如何从 xml 文件中提取数据,但我不知道如何提取 xml.file 的一部分。我正在使用 python ElementTree。
任何建议都会有所帮助。
使用find
和path
,然后得到相对父节点,像这样:
s = '''<Model version = "1" importVersion = "12.2">
<Create>
<SubNetwork networkType = "WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs064841"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT78">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04798"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
<SubNetwork networkType = "WRAN" userLabel = "AHPT4">
<ManagedElement sourceType = "CELLO">
<ManagedElementId string = "rbs04456"/>
<primaryType type = "RBS"/>
<managedElementType types = ""/>
<associatedSite string = "Site=site06484"/>
<nodeVersion string = "W12B"/>
<platformVersion string = "Cello 12.2"/>
<swVersion string = ""/>
<vendorName string = "ERICSSON"/>
<userDefinedState string = ""/>
<managedServiceAvailability int = "1"/>
<isManaged boolean = "true"/>
<neMIMVersion string = "vS.1.150"/>
<connectionStatus string = "ON"/>
</ManagedElement>
</SubNetwork>
</Create>
</Model>'''
# I'd prefer lxml, but you need to work on xml module...
import xml.etree.ElementTree as ET
tree = ET.fromstring(s)
# since the SubNetwork node you're interested is the parent of parent of ManagedElementId
node = tree.find('.//ManagedElementId[@string="rbs064841"]/../../../')
print ET.tostring(node)
<SubNetwork networkType="WRAN" userLabel="AHPTUR14">
<ManagedElement sourceType="CELLO">
<ManagedElementId string="rbs064841"/>
<primaryType type="RBS"/>
<managedElementType types=""/>
<associatedSite string="Site=site06484"/>
<nodeVersion string="W12B"/>
<platformVersion string="Cello 12.2"/>
<swVersion string=""/>
<vendorName string="ERICSSON"/>
<userDefinedState string=""/>
<managedServiceAvailability int="1"/>
<isManaged boolean="true"/>
<neMIMVersion string="vS.1.150"/>
<connectionStatus string="ON"/>
</ManagedElement>
</SubNetwork>
如果您从文件中解析,请使用 getroot()
:
root = ET.parse('file.xml')
tree = root.getroot()
...
希望对您有所帮助。