PHP 无法读取 media:content 属性

Question

我使用以下 PHP 代码将 RSS 提要解析为 HTML:

function get_rss_feed_as_html($feed_url, $max_item_cnt = 10, $show_date = true, $show_description = true, $max_words = 0, $cache_timeout = 7200, $cache_prefix = "/tmp/rss2html-")
    {
    $result = "";
    $rss = new DOMDocument();
    $cache_file = $cache_prefix . md5($feed_url);

    if ($cache_timeout > 0 &&
        is_file($cache_file) &&
        (filemtime($cache_file) + $cache_timeout > time())) {
            $rss->load($cache_file);
    } else {
        $rss->load($feed_url);
        if ($cache_timeout > 0) {
            $rss->save($cache_file);
        }
    }

    $feed = array();
    foreach ($rss->getElementsByTagName('entry') as $node) {
        
        $item = array (
            'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
            'desc' => $node->getElementsByTagName('content ')->item(0)->nodeValue,
            'content' => $node->getElementsByTagName('content')->item(0)->nodeValue,
            'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href'),
            'date' => $node->getElementsByTagName('updated')->item(0)->nodeValue,
            'media' => $node->getElementsByTagName('media:content')->item(0)->getAttribute('url'),
        );
        $content = $node->getElementsByTagName('encoded');
        if ($content->length > 0) {
            $item['content'] = $content->item(0)->nodeValue;
        }
        array_push($feed, $item);
    }

    if ($max_item_cnt > count($feed)) {
        $max_item_cnt = count($feed);
    }
    $result .= '<div class="bw-feedly-list">';
    for ($x=0;$x<$max_item_cnt;$x++) {
        $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
        $link = $feed[$x]['link'];
        $result .= '<div class="bw-feedly-item-col">';
        $result .= '<a class="bw-feedly-item" href="'.$link.'" title="'.$title.'" target="_blank">';
        if ($show_date) {
            $date = date('F d, Y', strtotime($feed[$x]['date']));
            $result .= '<div class="bw-feedly-date">'.$date.'</div>';
        }
        
        $result .= '<strong class="bw-feedly-title">'.$title.'</strong>';
        
        if ($show_description) {
            $result .= '<div class="bw-feedly-row">';
            $result .= '<div class="bw-feedly-summary-col">';
            
            $description = $feed[$x]['content'];
            $content = $feed[$x]['content'];

            // no html tags
            $description = strip_tags(preg_replace('/(<(script|style)\b[^>]*>).*?(<\/>)/s', "", $description), '');
            // whether cut by number of words
            if ($max_words > 0) {
                $arr = explode(' ', $description);
                if ($max_words < count($arr)) {
                    $description = '';
                    $w_cnt = 0;
                    foreach($arr as $w) {
                        $description .= $w . ' ';
                        $w_cnt = $w_cnt + 1;
                        if ($w_cnt == $max_words) {
                            break;
                        }
                    }
                    $description .= " ...";
                }
            }
            
            $result .= '<div class="feed-description">' . $description . '</div>';
            
            $media = $feed[$x]['media'];
            
            // add img if it exists
            //if ($media !== '') {
                $result .= '<div class="bw-feedly-image-col">';
                $result .= '<div class="bw-feedly-image-wrap" style="background-image: url('. $media .');">';
                $result .= '<img class="bw-feedly-image" src="'. $media .'">';
                $result .= '</div></div>';
            //}
            
            $result .= '</div></div>';
        }
        $result .= '</div>';
    }
    $result .= '</a></div>';
    return $result;
}

它工作正常，除了检索正确的媒体 (URL) 属性：

'media' => $node->getElementsByTagName('media:content')->item(0)->getAttribute('url'),

出现以下错误：致命错误：未捕获错误：调用成员函数 getAttribute() on null in

在这里我可以毫无问题地访问属性..

'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href')

并非 XML 提要中的所有条目都有媒体元素，但任何空检查都不会改变任何事情。

我也试过这段代码，我想我很接近，但仍然没有成功。它为所有条目打印 'content is null'..

 if($node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
        $image = $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
    } else {
    
        echo '<p>content is null</p>';
    }

xPath 表达式对我也没有帮助。

$xpath = new DOMXpath($rss);
$xpath->registerNamespace('m', 'http://search.yahoo.com/mrss/');

foreach ($xpath->evaluate('//entry') as $item) 
{
    $media = $xpath->evaluate('string(m:content/@url)', $item);
    echo '<p> MEDIA ITEM: '.$media.'</p>';
}

这里是XML的一部分。

    <entry>
     <id>tag:04ac51c7-b707-43cc-8a73-c482da986a27</id>
     <title type="html">Lorem Ipsum</title>
     <published>2020-09-28T19:36:26Z</published>
     <updated>2020-09-28T06:01:22Z</updated>
     <link rel="alternate" href="https://www.lipsum.com/" type="text/html"/>
     <content type="html">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. ...</content>
     <author>
     <name/>
     </author>
     <media:content medium="image" url="https://picsum.photos/200/300"/>
     <source>
     <id>tag:04ac51c7-b707-43cc-8a73-c482da986a27</id>
     <title type="html">Lorum ipsum</title>
     <link rel="alternate" type="text/html" href="https://www.lipsum.com/"/>
     <updated>2020-09-28T06:01:22Z</updated>
     </source>
    </entry>
    <entry>

这里有什么技巧？

Answer 1

它应该与 getElementsByTagNameNS 函数一起使用。

您应该可以在没有名称空间标记的情况下使用 getElementsByTagName。所以省略 'media'.

$node->getElementsByTagName('content')->item(0)->getAttribute('url')

如果您有多个包含内容的命名空间，这会发生冲突。

Answer 2

我已经搞定了，希望对其他人有帮助。

    $image = '';
    if($node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
        $image = $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
    }

PHP 无法读取 media:content 属性

PHP Can't read media:content attribute

php

rss