修改DOM一次导致后续修改出错

Modifying DOM once causes subsequent modifications to error

我正在尝试使用 PHP 的 DOM 文档和 XPath 将某些短语的所有实例包装在 <span> 中。我的逻辑基于 this answer from another post,但这只允许我 select 节点内的第一个匹配项,当我需要 select all 匹配。

一旦我修改了第一个匹配项的 DOM,我的后续循环就会导致错误,在 $after 所在的行显示 Fatal error: Uncaught Error: Call to a member function splitText() on bool。我很确定这是由修改标记引起的,但我一直无法弄清楚原因。

我做错了什么?

/**
 * Automatically wrap various forms of CCJM in a class for branding purposes
 *
 * @link 
 *
 * @param string $content
 * @return string
 */
function ccjm_branding_filter(string $content): string {
    if (! (is_admin() && ! wp_doing_ajax()) && $content) {
        $DOM = new DOMDocument();

        /**
         * Use internal errors to get around HTML5 warnings
         */
        libxml_use_internal_errors(true);

        /**
         * Load in the content, with proper encoding and an `<html>` wrapper required for parsing
         */
        $DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

        /**
         * Clear errors to get around HTML5 warnings
         */
        libxml_clear_errors();

        /**
         * Initialize XPath
         */
        $XPath = new DOMXPath($DOM);

        /**
         * Retrieve all text nodes, except those within scripts
         */
        $text = $XPath->query("//text()[not(parent::script)]");

        foreach ($text as $node) {
            /**
             * Find all matches, including offset
             */
            preg_match_all("/(C\.? ?C\.?(?:JM| Johnson (?:&|&amp;|&#38;|and) Malhotra)(?: Engineers, LTD\.?|, P\.?C\.?)?)/i", $node->textContent, $matches, PREG_OFFSET_CAPTURE);

            /**
             * Wrap each match in appropriate span
             */
            foreach ($matches as $group) {
                foreach ($group as $key => $match) {
                    /**
                     * Determine the offset and the length of the match
                     */
                    $offset = $match[1];
                    $length = strlen($match[0]);

                    /**
                     * Isolate the match and what comes after it
                     */
                    $word  = $node->splitText($offset);
                    $after = $word->splitText($length);

                    /**
                     * Create the wrapping span
                     */
                    $span = $DOM->createElement("span");
                    $span->setAttribute("class", "__brand");

                    /**
                     * Replace the word with the span, and then re-insert the word within it
                     */
                    $word->parentNode->replaceChild($span, $word);
                    $span->appendChild($word);

                    break; // it always errors after the first loop
                }
            }
        }

        /**
         * Save changes, remove unneeded tags
         */
        $content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
    }

    return $content;
}
add_filter("ccjm_final_output", "ccjm_branding_filter");

示例内容(匹配“C.C.Johnson & Malhotra,P.C.”和“CCJM”的所有实例,但只有第一个可以修改成功):

C.C. Johnson & Malhotra, P.C. (CCJM) was an integral member of a large Design Team for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project. The east-west light rail system extends from New Carrollton in PG County, MD to Bethesda in MO County, MD with 21 stations and one short tunnel. CCJM was Engineer of Record (EOR) for the design of eight (8) Bridges and design reviews for 35 transit/highway bridges and over 100 retaining walls of different lengths/types adjacent to bridges and in areas of cut/fill. CCJM designed utility structures for 42,000 LF of relocated water mains and 19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local Standards.

编辑 1:做一些测试,当我输出 $node->textContent 时,我发现它在第一个循环后发生了变化...所以我认为发生的事情是在我执行 $node->splitText($offset) 之后,它是实际上更新了整个节点,所以后续的偏移量不起作用。

首先,我认为 foreach ($matches as $group) 在这里不正确 - 如果您检查 $matches 包含的内容,那是相同的匹配项 twice,但是您可能不想将它们包装成 span 两次。因此应该删除 foreach 循环,而下面的循环应该只遍历 $matches[0]

其次,我认为您的偏移问题可以简单地解决,如果您只是“向后骑马” - 不要从头到尾替换找到的匹配项,而是以相反的顺序替换。然后你将只会操纵当前位置“后面”的结构,所以无论那里发生什么变化,都不会影响 previous 匹配的位置。

        /**
         * Wrap each match in appropriate span
         */
        //foreach ($matches as $group) {
        $group = array_reverse($matches[0]);
            foreach ($group as $key => $match) {
                /**
                 * Determine the offset and the length of the match
                 */
                $offset = $match[1];
                $length = strlen($match[0]);

                /**
                 * Isolate the match and what comes after it
                 */
                $word  = $node->splitText($offset);
                $after = $word->splitText($length);

                /**
                 * Create the wrapping span
                 */
                $span = $DOM->createElement("span");
                $span->setAttribute("class", "__brand");

                /**
                 * Replace the word with the span, and then re-insert the word within it
                 */
                $word->parentNode->replaceChild($span, $word);
                $span->appendChild($word);

                //break; // it always errors after the first loop
            }
        //}

我使用您的样本输入数据得到的结果如下(此处为实例,https://3v4l.org/kbSQ8

<p><span class="__brand">C.C. Johnson &amp; Malhotra, P.C.</span> (<span
class="__brand">CCJM</span>) was an integral member of a large Design Team
for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project.
The east-west light rail system extends from New Carrollton in PG County,
MD to Bethesda in MO County, MD with 21 stations and one short tunnel.
<span class="__brand">CCJM</span> was Engineer of Record (EOR) for the
design of eight (8) Bridges and design reviews for 35 transit/highway
bridges and over 100 retaining walls of different lengths/types adjacent to
bridges and in areas of cut/fill. <span class="__brand">CCJM</span>
designed utility structures for 42,000 LF of relocated water mains and
19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary
Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local
Standards.</p>