PHP 从 xml 获取 img src

PHP get img src from xml

我有一个 xml 的页面看起来像:

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
  <channel>
    <title>FB-RSS feed for Salman Khan  Fc</title>
    <link>http://facebook.com/profile.php?id=1636293749919827/</link>
    <description>FB-RSS feed for Salman Khan  Fc</description>
    <managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
    <pubDate>31 Mar 16 20:00 +0000</pubDate>
    <item>
      <title>Photo - Who is the Best Khan ?</title>
      <link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
      <description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&amp;oe=57BB41D5&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;Who is the Best Khan ?</description>
      <author>FB-RSS</author>
      <guid>1636293749919827_1713146978901170</guid>
      <pubDate>31 Mar 16 20:00 +0000</pubDate>
    </item>
    <item>
      <title>Photo</title>
      <link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
      <description>&lt;a href=&#34;https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3&#34;&gt;&lt;img src=&#34;https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&amp;oe=57778068&#34;&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;</description>
      <author>FB-RSS</author>
      <guid>1636293749919827_1713146755567859</guid>
      <pubDate>31 Mar 16 19:58 +0000</pubDate>
    </item>
  </channel>
</rss>

我想获取上面xmlimgsrc

图像存储在 <description> 但是,它们不是

的格式

<img...

它们看起来像:

&lt;img src=&#34;https://scontent.xx.fbc...

< 被替换为 &lt;...我想这就是为什么 $imgs = $dom->getElementsByTagName('img'); returns 什么都没有。

有什么解决办法吗?

我是这样称呼它的:

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;

//Then run a possible foreach
//something like:

foreach($imgs as $img){

   $src= ///the src of the $img

   //try it out
   echo '<img src="'.$src.'" /> <br />',
}

有什么想法吗?

您在 XML 标签中嵌入了 HTML,因此您必须检索 XML 个节点,加载每个 HTML 并检索所需的标签属性。

在您的 XML 中有不同的 <description> 节点,因此使用 ->getElementsByTagName 将比您想要的节点多 return。使用 DOMXPath 仅检索正确树位置中的 <description> 个节点:

$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;

$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );

然后遍历所有节点,在新的DOMDocument中加载节点值(不需要解码html实体,DOM已经为你解码),并提取src 来自 <img> 节点的属性:

foreach( $nodes as $node )
{
    $html = new DOMDocument();
    $html->loadHTML( $node->nodeValue );
    $src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}

eval.in demo