XPath 直到下一个标记

Question

一个类似于其他人之前在这里提出的问题，但由于我不知道如何应用这些建议，我需要一些帮助。

我想找到具有如下结构的 html 文档的节点（摘录，可能会有所不同）：

<h2>My title 1</h2>
<h3>Sub-heading</h3>
<p>...<span><a href='#'>...</a></span></p>
<div>...</div>
<h2>My title 2</h2>
<p>No sub-heading here :O</p>
<h3>But here</h3>
<p>No link</p>
<h2>And so on...</h2>
<p>...</p>

我想要完成的是找到从一个 h2 到下一个 h2 之前的最后一项的所有节点，包括 h2 本身。在我的示例中，我想像这样检索 "blocks"：

区块 1：

<h2>My title 1</h2>
<h3>Sub-heading</h3>
<p>...<span><a href='#'>...</a></span></p>
<div>...</div>

区块 2：

<h2>My title 2</h2>
<p>No sub-heading here :O</p>
<h3>But here</h3>
<p>No link</p>

区块 3：

<h2>And so on...</h2>
<p>...</p>

除了h2，我没有任何目标（没有id，没有我能知道的文本内容，没有确定的内容，等等）。

Answer 1

你可以使用DOMXpath and query方法。

首先从正文中找出所有的h2元素（不是嵌套的h2元素）

然后为找到的每个 h2 启动一个 foreach 循环。然后将该 h2 添加到数组 $set 中，因为您要保存它。然后循环兄弟姐妹并将它们添加到数组 $set 直到你找到的下一个 h2。

将 $set 添加到 $sets 数组。

例如：

$html = <<<HTML
<h2>My title 1</h2>
<h3>Sub-heading</h3>
<p>...<span><a href='#'>...</a></span></p>
<div>...</div>
<h2>My title 2</h2>
<p>No sub-heading here :O</p>
<h3>But here</h3>
<p>No link</p>
<h2>And so on...</h2>
<p>...</p>
<div><h2>This is nested</h2></div>
HTML;

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$domNodeList = $xpath->query('/html/body/h2');

$sets = array();

foreach($domNodeList as $element) {
    // Save the h2
    $set = array($element);

    // Loop the siblings unit the next h2
    while ($element = $element->nextSibling) {
        if ($element->nodeName === "h2") {
            break;
        }
        // if Node is a DOMElement
        if ($element->nodeType === 1) {
            $set[] = $element;
        }
    }

    $sets[] = $set;
}

$sets 现在将包含 3 个数组，其中包含您添加的 DOMElement。

Demo with var_dump of $sets

XPath 直到下一个标记

XPath until next tag

php

xpath

domdocument