XPath表达式：选择元素节点之间的文本节点

Question

基于以下HTML我想提取TextA、TextC和TextE。

<div id='content'>
    TextA
    <br/>
    <br/>
    <p>TextB</p>
    TextC
    <br/>
    TextC
    <p>TextD</p>
    TextE
</div>

我试过像这样获取 TextC，但没有得到我想要的结果：

查询：
//*[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
预期结果：
["TextC", <br/>, "TextC"]
实际结果：
[<br/>]

有没有办法 select 文本节点而不使用像 //div/text()[1] 这样的索引？

Answer 1

这两个文本节点不在您的 XPath 结果中的原因是因为 * 仅匹配元素。要同时匹配元素和文本节点，您可以使用 node() 代替：

//node()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]

Demo

或者如果你只想获取文本节点，即排除 <br/>，你可以使用 text() 而不是 node():

//text()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]

XPath expression: selecting text nodes between element nodes