我如何在 xml 文件中查找标签并找到它的祖父母?
How do I look for a tag in xml file and find it's grandparent?
我有一个 XML 文件,我想对其进行解析并查找其中存在的某些关键字。
XML 文件如下
...
...
<OBJECT data="file://localhost//var/tmp/autoclean/derive/TheGeometry//Descartes-TheGeometry.djvu" height="3143" type="image/x.djvu" usemap="Descartes-TheGeometry_0269.djvu" width="2077">
<PARAM name="PAGE" value="Descartes-TheGeometry_0269.djvu"/>
<PARAM name="DPI" value="400"/>
<HIDDENTEXT>
<PAGECOLUMN>
<REGION>
<PARAGRAPH>
<LINE>
<WORD coords="653,237,937,202,236">CATALOGUE</WORD>
<WORD coords="962,238,1022,205,237">OF</WORD>
<WORD coords="1045,240,1208,205,238">DOVER</WORD>
<WORD coords="1231,239,1389,205,238">BOOKS</WORD>
</LINE>
...
</PARAGRAPH>
...
...
<HIDDENTEXT>
</OBJECT>
...
...
现在我想在 <WORD>
标签中搜索关键字并检查第一个 <PARAM>
标签的值属性对应于直接 parent <OBJECT>
。
例如,假设我搜索关键字 BOOKS
然后我想从这个标签中获取值 <PARAM name="PAGE" value="Descartes-TheGeometry_0269.djvu"/>
尝试这样的事情:
import lxml.html as lh
books = """[your code]"""
doc = lh.fromstring(books)
vals = doc.xpath('//object/param[following-sibling::hiddentext//word="books"][1]/@value')
for val in vals:
print(val)
输出:
descartes-thegeometry_0269.djvu
我有一个 XML 文件,我想对其进行解析并查找其中存在的某些关键字。 XML 文件如下
...
...
<OBJECT data="file://localhost//var/tmp/autoclean/derive/TheGeometry//Descartes-TheGeometry.djvu" height="3143" type="image/x.djvu" usemap="Descartes-TheGeometry_0269.djvu" width="2077">
<PARAM name="PAGE" value="Descartes-TheGeometry_0269.djvu"/>
<PARAM name="DPI" value="400"/>
<HIDDENTEXT>
<PAGECOLUMN>
<REGION>
<PARAGRAPH>
<LINE>
<WORD coords="653,237,937,202,236">CATALOGUE</WORD>
<WORD coords="962,238,1022,205,237">OF</WORD>
<WORD coords="1045,240,1208,205,238">DOVER</WORD>
<WORD coords="1231,239,1389,205,238">BOOKS</WORD>
</LINE>
...
</PARAGRAPH>
...
...
<HIDDENTEXT>
</OBJECT>
...
...
现在我想在 <WORD>
标签中搜索关键字并检查第一个 <PARAM>
标签的值属性对应于直接 parent <OBJECT>
。
例如,假设我搜索关键字 BOOKS
然后我想从这个标签中获取值 <PARAM name="PAGE" value="Descartes-TheGeometry_0269.djvu"/>
尝试这样的事情:
import lxml.html as lh
books = """[your code]"""
doc = lh.fromstring(books)
vals = doc.xpath('//object/param[following-sibling::hiddentext//word="books"][1]/@value')
for val in vals:
print(val)
输出:
descartes-thegeometry_0269.djvu