DomDocument 解析换行符适用于 span 但不适用于 img

DomDocument parse Newline works with span but not img

看这里:https://ideone.com/bjs3IC

为什么换行符正确显示为 span 而不是 img

<?php
    outputImages();
    outputSpans();




    function outputImages(){
        $html = "<div class='test'>
                    <pre>
                    <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                    <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                    <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                    </pre>
                </div>";
        getHtml($html);
    }


    function outputSpans(){
        $html = "<div class='test'>
                    <pre>
                    <span>a</span>
                    <span>b</span>
                    <span>c</span>
                    </pre>
                </div>";
        getHtml($html);
    }


    function getHtml($html){
        $doc = new DOMDocument;
        $doc->loadhtml($html);
        $xpath = new DOMXPath($doc);
        $tags = $xpath->query('//div[@class="test"]');
        print(get_inner_html($tags[0]));
    }


    function get_inner_html( $node ) {
        $innerHTML= '';
        $children = $node->childNodes;
        foreach ($children as $child) {
            $innerHTML .= $child->ownerDocument->saveXML( $child );
        }

        return $innerHTML;
    }

DOMDocument::loadHTML 函数有第二个 options 参数。看起来 LIBXML_NOBLANKS 是(至少其中之一)那里的默认值。

您可以使用

$doc->loadhtml($html, LIBXML_NOEMPTYTAG);

要覆盖该默认值,您的代码将对这两个示例发挥相同的作用。

p.s.
不确定为什么使用

print(get_inner_html($tags[0]));

$tags变量是一个DOMNodeList,所以你应该使用$tags->item(0)来获取第一个标签。

您的完整代码应如下所示:

outputImages();
outputSpans();

function outputImages() {
    $html = "<div class='test'>
                <pre>
                <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                <img src='http://d...content-available-to-author-only...e.com/5x5/000/fff'>
                </pre>
            </div>";
    getHtml($html);
}

function outputSpans() {
    $html = "<div class='test'>
                <pre>
                <span>a</span>
                <span>b</span>
                <span>c</span>
                </pre>
            </div>";
    getHtml($html);
}

function getHtml($html) {
    $doc = new DOMDocument;
    $doc->loadHTML($html, LIBXML_NOEMPTYTAG);
    $xpath = new DOMXPath($doc);
    $tags = $xpath->query('//div[@class="test"]');
    print(get_inner_html($tags->item(0)));
}

function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }
    return $innerHTML;
}