python lxml iterparse() 正在跳过第一个事件
python lxml iterparse() is skipping first event
我正在使用 python lxml 中的 iterparse() 来解析大型 XML 文件并获取相关数据。这工作得很好,除了第一次发生事件。未捕获第一个节点的数据。当我想获取标签 "way" 时,也会发生同样的事情(不在此代码片段中)。 为什么第一个事件元素没有被捕获?
tree = etree.iterparse(state_file_xml, events=("start", "end"),tag=('node'))
context = iter(tree)
event, root = context.next()
nodes = {}
for event, elem in context:
if ((event == 'end') and (elem.tag == 'node')) :
id = elem.get("id")
lat = float(elem.get("lat"))
lon = float(elem.get("lon"))
nodes[id] = [lat,lon]
我的 xml 文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.55.4 3079d8ea">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2018-11-09T21:23:02Z"/>
<way id="46916568">
<nd ref="286427634"/>
<nd ref="3371562694"/>
<nd ref="3371562693"/>
<nd ref="1044837456"/>
<nd ref="1299487829"/>
<nd ref="1299487860"/>
<nd ref="284132018"/>
<tag k="highway" v="secondary"/>
<tag k="lit" v="yes"/>
<tag k="maxspeed" v="50"/>
<tag k="name" v="Zürcherstrasse"/>
<tag k="surface" v="asphalt"/>
</way>
<node id="30228243" lat="47.4030908" lon="8.4049015"/>
<node id="283533527" lat="47.4016971" lon="8.4036696"/>
<node id="284132018" lat="47.4034413" lon="8.4042634"/>
<node id="286427571" lat="47.4037481" lon="8.4058661"/>
<node id="286427634" lat="47.4043045" lon="8.4032429"/>
<node id="318217124" lat="47.4044289" lon="8.4054211"/>
<node id="428076175" lat="47.4027948" lon="8.4045078"/>
<node id="460527594" lat="47.4027445" lon="8.4055605"/>
<node id="460527973" lat="47.4029993" lon="8.4040697"/>
<node id="984783907" lat="47.4027808" lon="8.4054934"/>
context.next()
消费第一个节点:
In [14]: tree = etree.iterparse(state_file_xml, events=("start", "end"),tag=('node'))
In [15]: context = iter(tree)
In [16]: event, root = next(context)
In [17]: root.attrib
Out[17]: {'id': '30228243', 'lon': '8.4049015', 'lat': '47.4030908'}
(我将 context.next()
更改为 next(context)
以允许代码同时使用 Python2 和 Python3。)
顺便说一下,iterparse
returns 是一个迭代器,所以 context = iter(tree)
是不必要的。
由于您只需要处理每个 node
一次,因此 events=("end",)
就足够了:
import lxml.etree as ET
context = ET.iterparse(state_file_xml, events=("end",), tag=('node'))
nodes = {}
for event, elem in context:
id = elem.get("id")
lat = float(elem.get("lat"))
lon = float(elem.get("lon"))
nodes[id] = [lat,lon]
我正在使用 python lxml 中的 iterparse() 来解析大型 XML 文件并获取相关数据。这工作得很好,除了第一次发生事件。未捕获第一个节点的数据。当我想获取标签 "way" 时,也会发生同样的事情(不在此代码片段中)。 为什么第一个事件元素没有被捕获?
tree = etree.iterparse(state_file_xml, events=("start", "end"),tag=('node'))
context = iter(tree)
event, root = context.next()
nodes = {}
for event, elem in context:
if ((event == 'end') and (elem.tag == 'node')) :
id = elem.get("id")
lat = float(elem.get("lat"))
lon = float(elem.get("lon"))
nodes[id] = [lat,lon]
我的 xml 文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.55.4 3079d8ea">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2018-11-09T21:23:02Z"/>
<way id="46916568">
<nd ref="286427634"/>
<nd ref="3371562694"/>
<nd ref="3371562693"/>
<nd ref="1044837456"/>
<nd ref="1299487829"/>
<nd ref="1299487860"/>
<nd ref="284132018"/>
<tag k="highway" v="secondary"/>
<tag k="lit" v="yes"/>
<tag k="maxspeed" v="50"/>
<tag k="name" v="Zürcherstrasse"/>
<tag k="surface" v="asphalt"/>
</way>
<node id="30228243" lat="47.4030908" lon="8.4049015"/>
<node id="283533527" lat="47.4016971" lon="8.4036696"/>
<node id="284132018" lat="47.4034413" lon="8.4042634"/>
<node id="286427571" lat="47.4037481" lon="8.4058661"/>
<node id="286427634" lat="47.4043045" lon="8.4032429"/>
<node id="318217124" lat="47.4044289" lon="8.4054211"/>
<node id="428076175" lat="47.4027948" lon="8.4045078"/>
<node id="460527594" lat="47.4027445" lon="8.4055605"/>
<node id="460527973" lat="47.4029993" lon="8.4040697"/>
<node id="984783907" lat="47.4027808" lon="8.4054934"/>
context.next()
消费第一个节点:
In [14]: tree = etree.iterparse(state_file_xml, events=("start", "end"),tag=('node'))
In [15]: context = iter(tree)
In [16]: event, root = next(context)
In [17]: root.attrib
Out[17]: {'id': '30228243', 'lon': '8.4049015', 'lat': '47.4030908'}
(我将 context.next()
更改为 next(context)
以允许代码同时使用 Python2 和 Python3。)
顺便说一下,iterparse
returns 是一个迭代器,所以 context = iter(tree)
是不必要的。
由于您只需要处理每个 node
一次,因此 events=("end",)
就足够了:
import lxml.etree as ET
context = ET.iterparse(state_file_xml, events=("end",), tag=('node'))
nodes = {}
for event, elem in context:
id = elem.get("id")
lat = float(elem.get("lat"))
lon = float(elem.get("lon"))
nodes[id] = [lat,lon]