使用 domdocument 和 preg_replace_callback 在 html 中设置标签

set tags in html using domdocument and preg_replace_callback

我尝试用 (html) 锚点替换我的术语词典中的单词,以便它获得工具提示。我完成了替换部分,但我无法将其恢复到 DomDocument 对象中。

我做了一个迭代 DOM 的递归函数,它迭代每个子节点,在我的字典中搜索单词并将其替换为锚点。

我一直在 HTML 上将其与普通 preg_match 一起使用,但遇到了问题......当 HTML 变得复杂时

递归函数:

$terms = array(
   'example'=>'explanation about example'
);

function iterate_html($doc, $original_doc = null)
    {
    global $terms;

        if(is_null($original_doc)) {
            self::iterate_html($doc, $doc);
        }

        foreach($doc->childNodes as $childnode)
        {
            $children = $childnode->childNodes;
            if($children) {
                self::iterate_html($childnode);
            } else {

                $regexes = '~\b' . implode('\b|\b',array_keys($terms)) . '\b~i';
                $new_nodevalue = preg_replace_callback($regexes, function($matches) {
                    $doc = new DOMDocument();

                    $anchor = $doc->createElement('a', $matches[0]);
                    $anchor->setAttribute('class', 'text-info');
                    $anchor->setAttribute('data-toggle', 'tooltip');
                    $anchor->setAttribute('data-original-title', $terms[strtolower($matches[0])]);

                    return $doc->saveXML($anchor);

                }, $childnode->nodeValue);



                $dom = new DOMDocument();
                $template = $dom->createDocumentFragment();
                $template->appendXML($new_nodevalue);

                $original_doc->importNode($template->childNodes, true);
                $childnode->parentNode->replaceChild($template, $childnode);
            }
        }
    }

echo iterate_html('this is just some example text.');

我希望结果是:

this is just some <a class="text-info" data-toggle="tooltip" data-original-title="explanation about example">example</a> text

我不认为构建递归函数来遍历 DOM 在您可以使用 XPath 查询时有用。另外,我不确定 preg_replace_callback 是否适用于这种情况。我更喜欢使用 preg_split。这是一个例子:

$html = 'this is just some example text.';

$terms = array(
   'example'=>'explanation about example'
);

// sort by reverse order of key size
// (to be sure that the longest string always wins instead of the first in the pattern)

uksort($terms, function ($a, $b) {
    $diff = mb_strlen($b) - mb_strlen($a);

    return ($diff) ? $diff : strcmp($a, $b);
});

// build the pattern inside a capture group (to have delimiters in the results with the PREG_SPLIT_DELIM_CAPTURE option)
$pattern = '~\b(' . implode('|', array_map(function($i) { return preg_quote($i, '~'); }, array_keys($terms))) . ')\b~i';

// prevent eventual html errors to be displayed
$libxmlInternalErrors = libxml_use_internal_errors(true);

// determine if the html string have a root html element already, if not add a fake root.
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$fakeRootElement = false;

if ( $dom->documentElement->nodeName !== 'html' ) {
    $dom->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
    $fakeRootElement = true;
}

libxml_use_internal_errors($libxmlInternalErrors);

// find all text nodes (not already included in a link or between other unwanted tags)
$xp = new DOMXPath($dom);
$textNodes = $xp->query('//text()[not(ancestor::a)][not(ancestor::style)][not(ancestor::script)]');

// replacement
foreach ($textNodes as $textNode) {
    $parts = preg_split($pattern, $textNode->nodeValue, -1, PREG_SPLIT_DELIM_CAPTURE);
    $fragment = $dom->createDocumentFragment();
    foreach ($parts as $k=>$part) {
        if ($k&1) {
            $anchor = $dom->createElement('a', $part);
            $anchor->setAttribute('class', 'text-info');
            $anchor->setAttribute('data-toggle', 'tooltip');
            $anchor->setAttribute('data-original-title', $terms[strtolower($part)]);
            $fragment->appendChild($anchor);
        } else {
            $fragment->appendChild($dom->createTextNode($part));
        }
    }
    $textNode->parentNode->replaceChild($fragment, $textNode);
}


// building of the result string
$result = '';

if ( $fakeRootElement ) {
    foreach ($dom->documentElement->childNodes as $childNode) {
        $result .= $dom->saveHTML($childNode);
    }
} else {
    $result = $dom->saveHTML();
}

echo $result;

demo

随意将其放入一个或多个 functions/methods,但请记住,这种编辑具有 non-neglictable 权重,每次 html 时都应使用已编辑(而不是每次显示 html)。