使用 DomXPath 查找并提取某些 class 部分的内容

Question

我正在尝试将远程页面的特定部分的内容提取并保存到 PHP 字符串（或数组）中。该特定部分如下所示：

<section class="intro">
        <div class="container">
            <h1>Student Club</h1>
            <h2>Subtitle</h2>
            <p>Lore ipsum paragraph.</p>
        </div>
</section>

并且由于我无法使用 class 容器缩小范围，因为同一页面上还有 class "container" 的其他几个部分，并且因为 class 的唯一部分=37=] "intro", 我用下面的代码找到合适的除法：

$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTMLFile("https://www.remotesite.tld/remotepage.html");
$finder = new DomXPath($doc);
$intro = $finder->query("//*[contains(@class, 'intro')]");

此时，我遇到了一个问题 - 无法将 $intro 的内容提取为 PHP 字符串。

进一步尝试以下代码

foreach ($intro as $item) {
                    $string = $item->nodeValue;
                    echo $string;
                }

仅提供文本值，所有标签都被剥离，我确实需要保留所有这些 div、h1 和 h2 以及 p 标签以供进一步操作需要。

正在尝试：

foreach ($intro->attributes as $attr) {
                $name = $attr->nodeName;
                $value = $attr->nodeValue;
                echo $name;
                echo $value;
            }

给出错误：

Notice: Undefined property: DOMNodeList::$attributes in

那么我怎样才能提取找到的 DOM 个元素的完整 HTML 代码呢？

Answer 1

我知道我是如此接近......我只需要做：

            foreach ($intro as $item) {
                $h1= $item->getElementsByTagName('h1');
                $h2= $item->getElementsByTagName('h2');
                $p= $item->getElementsByTagName('p');
            }

使用 DomXPath 查找并提取某些 class 部分的内容

Find and extract content of division of certain class using DomXPath

dom

domxpath