我可以在 XPath 中访问 parent 的子项吗？

Question

正如标题所述，我有一些来自 http://chem.sis.nlm.nih.gov/chemidplus/name/acetone that I am parsing and want to extract some data like the Acetone under MeSH Heading from my similar post

的 HTML 代码

<div id="names">
 <h2>Names and Synonyms</h2>
  <div class="ds">
   <button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">&#8596;</button>
 <h3>Name of Substance</h3>
 <div class="yui3-g-r">
  <div class="yui3-u-1-4">
   <ul>
    <li id="ds2">
     <div>2-Propanone</div>
    </li>
   </ul>
  </div>
  <div class="yui3-u-1-4">
   <ul>
    <li id="ds3">
     <div>Acetone</div>
    </li>
   </ul>
  </div>
  <div class="yui3-u-1-4">
   <ul>
    <li id="ds4">
     <div>Acetone [NF]</div>
    </li>
   </ul>
  </div>
  <div class="yui3-u-1-4">
   <ul>
    <li id="ds5">
     <div>Dimethyl ketone</div>
    </li>
   </ul>
  </div>
 </div>
 <h3>MeSH Heading</h3>
  <ul>
   <li id="ds6">
    <div>Acetone</div>
   </li>
  </ul>
 </div>
</div>

以前在其他页面中我会 mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content() 来提取数据，因为其他页面具有类似的结构，但现在我发现情况并非如此，因为我没有考虑到不一致之处。那么，有没有一种方法可以在转到我想要的节点之后获取它的子节点，从而实现不同页面之间的一致性？

做tree.xpath('//*[text()="MeSH Heading"]//preceding-sibling::text()[1]')行吗？

Answer 1

据我了解，您需要按标题获取项目列表。

如何制作一个适用于 "Names and Synonyms" 容器中每个标题的可重用函数：

from lxml.html import parse


tree = parse("http://chem.sis.nlm.nih.gov/chemidplus/name/acetone")

def get_contents_by_title(tree, title):
    return tree.xpath("//h3[. = '%s']/following-sibling::*[1]//div/text()" % title)

print get_contents_by_title(tree, "Name of Substance")
print get_contents_by_title(tree, "MeSH Heading")

打印：

['2-Propanone', 'Acetone', 'Acetone [NF]', 'Dimethyl ketone']
['Acetone']

我可以在 XPath 中访问 parent 的子项吗？

Can I access the subchild of a parent in XPath?

html

python

xpath

lxml

lxml.html