lxml etree 在之前找到最接近的元素
lxml etree find closest element before
xml 文档的结构如下
<a>
<b>
<d>
</b>
<c attr1="important"/>
<b>
<d>
</b>
<c attr1="so important" />
<b></b>
</a>
我的解析器首先获取所有 <d>
个元素
from lxml import etree
xmltree = etree.parse(document)
elems = xmltree.xpath('//d')
现在的任务是:
从最近的 <c>
标签获取属性 在 当前 <d>
标签之前,如果有的话。
天真的方法是做类似下面的事情
for el in elems:
it = el.getparent()
while it != None and it.tag != 'c':
prev = it.getprevious()
if prev == None:
it = it.getparent()
else:
it = prev
if it != None:
print el, it.get("attr1")
但对我来说这看起来并不简单 - 我是否遗漏了文档中的某些内容?我如何在不实现自己的迭代器的情况下解决这个问题?
使用 preceding
axis:
The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.
for el in elems:
try:
print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
except IndexError:
print "No preceding 'c' element."
演示:
>>> from lxml import etree
>>>
>>> data = """
... <a>
... <b>
... <d/>
... </b>
...
... <c attr1="important"/>
... <b>
... <d/>
... </b>
... <c attr1="so important" />
... <b></b>
... </a>
... """
>>> xmltree = etree.fromstring(data)
>>> elems = xmltree.xpath('//d')
>>>
>>> for el in elems:
... try:
... print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
... except IndexError:
... print "No preceding 'c' element."
...
No preceding 'c' element.
important
xml 文档的结构如下
<a>
<b>
<d>
</b>
<c attr1="important"/>
<b>
<d>
</b>
<c attr1="so important" />
<b></b>
</a>
我的解析器首先获取所有 <d>
个元素
from lxml import etree
xmltree = etree.parse(document)
elems = xmltree.xpath('//d')
现在的任务是:
从最近的 <c>
标签获取属性 在 当前 <d>
标签之前,如果有的话。
天真的方法是做类似下面的事情
for el in elems:
it = el.getparent()
while it != None and it.tag != 'c':
prev = it.getprevious()
if prev == None:
it = it.getparent()
else:
it = prev
if it != None:
print el, it.get("attr1")
但对我来说这看起来并不简单 - 我是否遗漏了文档中的某些内容?我如何在不实现自己的迭代器的情况下解决这个问题?
使用 preceding
axis:
The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.
for el in elems:
try:
print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
except IndexError:
print "No preceding 'c' element."
演示:
>>> from lxml import etree
>>>
>>> data = """
... <a>
... <b>
... <d/>
... </b>
...
... <c attr1="important"/>
... <b>
... <d/>
... </b>
... <c attr1="so important" />
... <b></b>
... </a>
... """
>>> xmltree = etree.fromstring(data)
>>> elems = xmltree.xpath('//d')
>>>
>>> for el in elems:
... try:
... print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
... except IndexError:
... print "No preceding 'c' element."
...
No preceding 'c' element.
important