节点中内容的 XPath 表达式，直到遇到带有字符串的节点

Question

我正在寻找一个 XPath 表达式来获取没有参考部分的文章内容。我想要文章部分中的所有内容，直到出现 <p> 标记，其中有一个“参考”。

//root/main/article/following-sibling::p[.="References"]

<root>
    <main>
        <article>
            <p>
               The stunning increase in homelessness announced in Los Angeles 
               this week — up 16% over last year citywide — was an almost  an 
               incomprehensible conundrum given the nation's booming economy 
               and the hundreds of millions of dollars that city, county and 
               state officials have directed toward the problem.
            </p>
            <p>
                "We cannot let a set of difficult numbers discourage us 
                or weaken our resolve" Garcetti said.
            </p>
            <p>
                References: Maeve Reston, CNN
            </p>
        </article>
    </main>
</root>

我正在寻找的结果如下。

<p>
    The stunning increase in homelessness announced in Los Angeles
    this week — up 16% over last year citywide — was an almost  an
    incomprehensible conundrum given the nation's booming economy
    and the hundreds of millions of dollars that city, county and
    state officials have directed toward the problem.
</p>
<p>
    "We cannot let a set of difficult numbers discourage us
    or weaken our resolve" Garcetti said.
</p>

Answer 1

这个 XPath，

/root/main/article/p[starts-with(normalize-space(),'References')]
                  /preceding-sibling::p

将 select 前面的段落 "References"。

如果您只想要那些 p 元素的文本节点子节点，您可以追加 /text()。

节点中内容的 XPath 表达式，直到遇到带有字符串的节点

XPath expression for content in a node until a node with string is encountered

php

xml

xpath

domxpath

xml-parsing