如何使用 iterparse 在 xml 中查找起始元素名称

Question

我有以下样本xml

<osm version="0.6" generator="CGImap 0.3.3 (28791 thorn-03.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
 <bounds minlat="41.9704500" minlon="-87.6928300" maxlat="41.9758200" maxlon="-87.6894800"/>
 <node id="261114295" visible="true" version="7" changeset="11129782" timestamp="2012-03-28T18:31:23Z" user="bbmiller" uid="451048" lat="41.9730791" lon="-87.6866303"/>

我想使用 python iter parse 从 xml 中提取边界和节点我尝试了以下代码片段

import xml.etree.cElementTree as ET
import pprint

def count_tags(filename):
    mytags = {}
    osmfile = open('example.osm', 'r')
    for event, elem in ET.iterparse(osmfile,events=('end',)):
        if elem.tag == "tag":
            if elem.attrib['k'] in mytags:
                mytags[elem.attrib['k']] += 1
            else:
                mytags[elem.attrib['k']] = 1

但我无法提取边界和节点...我缺少什么？

Answer 1

假设 bounds 和 node 是 XML 根下的一级，这应该有效：

def count_tags():
    mytags = {}
    for event, child in ET.iterparse('example.osm'):
        if child.tag in ('bounds', 'node'):
            mytags[child.tag] = child.attrib
    print mytags

调用 count_tags 输出：

{
    'node': {'changeset': '11129782', 'uid': '451048', 'timestamp': '2012-03-28T18:31:23Z', 'lon': '-87.6866303', 'visible': 'true', 'version': '7', 'user': 'bbmiller', 'lat': '41.9730791', 'id': '261114295'}, 
    'bounds': {'minlat': '41.9704500', 'maxlon': '-87.6894800', 'minlon': '-87.6928300', 'maxlat': '41.9758200'}
}

如何使用 iterparse 在 xml 中查找起始元素名称

How to find the starting element name in xml using iterparse

python

xml

iterparse