解析 python 中嵌套的 xml

parsing nested xml in python

我有这个 XML 文件:

<?xml version="1.0" ?><XMLSchemaPalletLoadTechData xmlns="http://tempuri.org/XMLSchemaPalletLoadTechData.xsd">
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>sample</MeasurementType>
  </TechDataParams>
  <TechDataParams>
    <RunNumber>sample</RunNumber>
    <Holder>sample</Holder>
    <ProcessToolName>sample</ProcessToolName>
    <RecipeName>sample</RecipeName>
    <PalletName>sample</PalletName>
    <PalletPosition>sample</PalletPosition>
    <IsControl>sample</IsControl>
    <LoadPosition>sample</LoadPosition>
    <HolderJob>sample</HolderJob>
    <IsSPC>sample</IsSPC>
    <MeasurementType>XRF</MeasurementType>
  </TechDataParams>
</XMLSchemaPalletLoadTechData>

这是我解析 xml 的代码:

for data in xml.getElementsByTagName('TechDataParams'):
    #parse xml
    runnum=data.getElementsByTagName('RunNumber')[0].firstChild.nodeValue
    hold=data.getElementsByTagName('Holder')[0].firstChild.nodeValue
    processtn=data.getElementsByTagName('ProcessToolName'[0].firstChild.nodeValue)
    recipedata=data.getElementsByTagName('RecipeName'[0].firstChild.nodeValue)
    palletna=data.getElementsByTagName('PalletName')[0].firstChild.nodeValue
    palletposi=data.getElementsByTagName('PalletPosition')[0].firstChild.nodeValue
    control = data.getElementsByTagName('IsControl')[0].firstChild.nodeValue
    loadpos=data.getElementsByTagName('LoadPosition')[0].firstChild.nodeValue
    holderjob=data.getElementsByTagName('HolderJob')[0].firstChild.nodeValue
    spc = data.getElementsByTagName('IsSPC')[0].firstChild.nodeValue
    mestype = data.getElementsByTagName('MeasurementType')[0].firstChild.nodeValue

但是当我打印每个节点时,我只得到一组 'TechDataParams',但我希望能够从 XML 中得到所有 'TechDataParams'。

如果我的问题有点不清楚,请告诉我。

这里给你举个例子。将 file_path 替换为您自己的。

我将 RunNumber 的值替换为 001002

# -*- coding: utf-8 -*-
#!/usr/bin/python

from xml.dom import minidom

file_path = 'C:\temp\test.xml'

doc = minidom.parse(file_path)
TechDataParams = doc.getElementsByTagName('TechDataParams')
for t in TechDataParams:
    num = t.getElementsByTagName('RunNumber')[0]
    print 'num is ', num.firstChild.data

输出:

num is  001
num is  002

请不要一头扎进用minidom解析XML,除非你想让头发自己拔掉。

我会在这里使用 xmltodict module。一行,你有一个包含你需要的所有数据的字典列表:

import xmltodict

data = """your xml here"""

data = xmltodict.parse(data)['XMLSchemaPalletLoadTechData']['TechDataParams']
for params in data:
    print dict(params)

打印:

{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'sample', u'Holder': u'sample', u'IsSPC': u'sample'}
{u'PalletPosition': u'sample', u'HolderJob': u'sample', u'RunNumber': u'sample', u'ProcessToolName': u'sample', u'RecipeName': u'sample', u'IsControl': u'sample', u'PalletName': u'sample', u'LoadPosition': u'sample', u'MeasurementType': u'XRF', u'Holder': u'sample', u'IsSPC': u'sample'}

也通过 lxml.etree 模块。

  1. 输入包含命名空间,即 http://tempuri.org/XMLSchemaPalletLoadTechData.xsd
  2. 使用 xpath 方法找到目标 TechDataParams 标签。
  3. 获取 TechDataParams 标签的子项并创建字典,其中 keytag name 并且 valuetext of tag
  4. 附加到列表变量,即 TechDataParams

代码:

from lxml import etree
root = etree.fromstring(content)
TechDataParams_info = []
for  i in root.xpath("//a:XMLSchemaPalletLoadTechData/a:TechDataParams", namespaces={"a": 'http://tempuri.org/XMLSchemaPalletLoadTechData.xsd'}):
    temp = dict()
    for j in i.getchildren():
        temp[j.tag.split("}", 1)[-1]] = j.text
    TechDataParams_info.append(temp)

print TechDataParams_info

输出:

[{'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'sample', 'Holder': 'sample', 'IsSPC': 'sample'}, {'PalletPosition': 'sample', 'HolderJob': 'sample', 'RunNumber': 'sample', 'ProcessToolName': 'sample', 'RecipeName': 'sample', 'IsControl': 'sample', 'PalletName': 'sample', 'LoadPosition': 'sample', 'MeasurementType': 'XRF', 'Holder': 'sample', 'IsSPC': 'sample'}]