PHP 从 xml 获取 img src
PHP get img src from xml
我有一个 xml 的页面看起来像:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
我想获取上面xml
中img
的src
图像存储在 <description>
但是,它们不是
的格式
<img...
它们看起来像:
<img src="https://scontent.xx.fbc...
。
<
被替换为 <
...我想这就是为什么 $imgs = $dom->getElementsByTagName('img');
returns 什么都没有。
有什么解决办法吗?
我是这样称呼它的:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;
//Then run a possible foreach
//something like:
foreach($imgs as $img){
$src= ///the src of the $img
//try it out
echo '<img src="'.$src.'" /> <br />',
}
有什么想法吗?
您在 XML 标签中嵌入了 HTML,因此您必须检索 XML 个节点,加载每个 HTML 并检索所需的标签属性。
在您的 XML 中有不同的 <description>
节点,因此使用 ->getElementsByTagName
将比您想要的节点多 return。使用 DOMXPath
仅检索正确树位置中的 <description>
个节点:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
然后遍历所有节点,在新的DOMDocument
中加载节点值(不需要解码html实体,DOM已经为你解码),并提取src
来自 <img>
节点的属性:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}
我有一个 xml 的页面看起来像:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0">
<channel>
<title>FB-RSS feed for Salman Khan Fc</title>
<link>http://facebook.com/profile.php?id=1636293749919827/</link>
<description>FB-RSS feed for Salman Khan Fc</description>
<managingEditor>http://fbrss.com (FB-RSS)</managingEditor>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
<item>
<title>Photo - Who is the Best Khan ?</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1639997232882812.1073741827.1636293749919827/1713146978901170/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/11059765_1713146978901170_8711054263905505442_n.jpg?oh=fa2978c5ecfb3ae424e9082aaa057b8f&oe=57BB41D5"></a><br><br>Who is the Best Khan ?</description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146978901170</guid>
<pubDate>31 Mar 16 20:00 +0000</pubDate>
</item>
<item>
<title>Photo</title>
<link>https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3</link>
<description><a href="https://www.facebook.com/SalmanKhanFns/photos/a.1636293813253154.1073741825.1636293749919827/1713146755567859/?type=3"><img src="https://scontent.xx.fbcdn.net/hphotos-xap1/v/t1.0-0/s130x130/12294686_1713146755567859_6728330714340999478_n.jpg?oh=6d90a688fdf4342f9e12e9ff9a66b127&oe=57778068"></a><br><br></description>
<author>FB-RSS</author>
<guid>1636293749919827_1713146755567859</guid>
<pubDate>31 Mar 16 19:58 +0000</pubDate>
</item>
</channel>
</rss>
我想获取上面xml
中img
的src
图像存储在 <description>
但是,它们不是
<img...
它们看起来像:
<img src="https://scontent.xx.fbc...
。
<
被替换为 <
...我想这就是为什么 $imgs = $dom->getElementsByTagName('img');
returns 什么都没有。
有什么解决办法吗?
我是这样称呼它的:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadXML( $xml_file);
$imgs = ...(get the imgs to extract the src...('img') ??;
//Then run a possible foreach
//something like:
foreach($imgs as $img){
$src= ///the src of the $img
//try it out
echo '<img src="'.$src.'" /> <br />',
}
有什么想法吗?
您在 XML 标签中嵌入了 HTML,因此您必须检索 XML 个节点,加载每个 HTML 并检索所需的标签属性。
在您的 XML 中有不同的 <description>
节点,因此使用 ->getElementsByTagName
将比您想要的节点多 return。使用 DOMXPath
仅检索正确树位置中的 <description>
个节点:
$dom = new DOMDocument();
libxml_use_internal_errors( True );
$dom->loadXML( $xml );
$dom->formatOutput = True;
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( 'channel/item/description' );
然后遍历所有节点,在新的DOMDocument
中加载节点值(不需要解码html实体,DOM已经为你解码),并提取src
来自 <img>
节点的属性:
foreach( $nodes as $node )
{
$html = new DOMDocument();
$html->loadHTML( $node->nodeValue );
$src = $html->getElementsByTagName( 'img' )->item(0)->getAttribute('src');
}