为什么 XPath 查询不起作用?
Why XPath Query is not working?
我想从以下 xml 中选择标题和 youtube link:
`<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom"><category term="videos" label="/r/videos"/> <icon>https://www.redditstatic.com/icon.png/</icon><id>/r/videos/.xml</id><link rel="self" href="https://www.reddit.com/r/videos/.xml" type="application/atom+xml" /><link rel="alternate" href="https://www.reddit.com/r/videos/" type="text/html" /><logo>https://a.thumbs.redditmedia.com/mtwnduVr0DnrK1o8rpTPi6waLWuPimj_8ntK8i5t890.png</logo><subtitle>A great place for video content of all kinds.</subtitle><title>Videos</title><entry><author><name>/u/LegendaryContent</name><uri>https://www.reddit.com/user/LegendaryContent</uri></author><category term="videos" label="/r/videos"/><content type="html"><table> <tr><td> <a href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/"> <img src="https://b.thumbs.redditmedia.com/UR4XFRqoMtj5watvSUrUlEdTYiA1gOv_OxqxtxNyftQ.jpg" alt="1,400 Employees being laid off" title="1,400 Employees being laid off" /> </a> </td><td> &#32; submitted by &#32; <a href="https://www.reddit.com/user/LegendaryContent"> /u/LegendaryContent </a> <br/> <span><a href="https://youtu.be/Y3ttxGMQOrY">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/">[comments]</a></span> </td></tr></table></content><id>t3_45crp7</id><link href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/" /><updated>2016-02-12T03:22:38+00:00</updated><title>1,400 Employees being laid off</title></entry></feed>`
我的代码在这里:
<?php
$videos ="";
$video_category = "Trending Videos";
$url = "https://www.reddit.com/r/videos/.xml";
$feed_dom = new domDocument;
$feed_dom->load($url);
$feed_dom->preserveWhiteSpace = false;
$items = $feed_dom->getElementsByTagName('entry');
foreach($items as $item){
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
$desc_table = $item->getElementsByTagName('content')->item(0)->nodeValue;
$table_dom = new domDocument;
$table_dom->loadHTML($desc_table);
$xpath = new DOMXpath($table_dom);
$table_dom->preserveWhiteSpace = false;
$yt_link_node = $xpath->query("//table/tr/td[2]/a[2]");
foreach($yt_link_node as $yt_link){
$yt = $yt_link->getAttribute('href');
echo $title;
echo $yt;
}
?>
出于某种原因,它不起作用,我几乎应用了在 google 和 Whosebug 上找到的所有 xpath 查询。
标题反响很好,但不是 $yt
。
你能挑出我做错了什么吗?
这是因为 DOM 与您的预期略有不同。
您在那里解析的 HTML ($desc_table) 通常具有以下结构:
<table>
<tr>
<td>
<a href="https://www.reddit.com/r/videos/comments/...">
<img src="https://b.thumbs.redditmedia.com/....jpg"
alt="..." title="..." />
</a>
</td>
<td>   submitted by  
<a href="https://www.reddit.com/user/..."> /u/... </a>
<br/>
<span>
<a href="https://youtu.be/...">[link]</a>
</span>
 
<span>
<a href="https://www.reddit.com/r/videos/comments/.../">[comments]</a>
</span>
</td>
</tr>
</table>
因此没有第二个锚元素 (a
) 是第二个 td
元素的直接子元素,因为第二个(和第三个)锚包含在 span
标签。
所以如果你想达到这个 link:
<a href="https://youtu.be/...">[link]</a>
然后改用此 XPath:
$yt_link_node = $xpath->query("//table/tr/td[2]/span[1]/a");
我想从以下 xml 中选择标题和 youtube link:
`<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom"><category term="videos" label="/r/videos"/> <icon>https://www.redditstatic.com/icon.png/</icon><id>/r/videos/.xml</id><link rel="self" href="https://www.reddit.com/r/videos/.xml" type="application/atom+xml" /><link rel="alternate" href="https://www.reddit.com/r/videos/" type="text/html" /><logo>https://a.thumbs.redditmedia.com/mtwnduVr0DnrK1o8rpTPi6waLWuPimj_8ntK8i5t890.png</logo><subtitle>A great place for video content of all kinds.</subtitle><title>Videos</title><entry><author><name>/u/LegendaryContent</name><uri>https://www.reddit.com/user/LegendaryContent</uri></author><category term="videos" label="/r/videos"/><content type="html"><table> <tr><td> <a href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/"> <img src="https://b.thumbs.redditmedia.com/UR4XFRqoMtj5watvSUrUlEdTYiA1gOv_OxqxtxNyftQ.jpg" alt="1,400 Employees being laid off" title="1,400 Employees being laid off" /> </a> </td><td> &#32; submitted by &#32; <a href="https://www.reddit.com/user/LegendaryContent"> /u/LegendaryContent </a> <br/> <span><a href="https://youtu.be/Y3ttxGMQOrY">[link]</a></span> &#32; <span><a href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/">[comments]</a></span> </td></tr></table></content><id>t3_45crp7</id><link href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/" /><updated>2016-02-12T03:22:38+00:00</updated><title>1,400 Employees being laid off</title></entry></feed>`
我的代码在这里:
<?php
$videos ="";
$video_category = "Trending Videos";
$url = "https://www.reddit.com/r/videos/.xml";
$feed_dom = new domDocument;
$feed_dom->load($url);
$feed_dom->preserveWhiteSpace = false;
$items = $feed_dom->getElementsByTagName('entry');
foreach($items as $item){
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
$desc_table = $item->getElementsByTagName('content')->item(0)->nodeValue;
$table_dom = new domDocument;
$table_dom->loadHTML($desc_table);
$xpath = new DOMXpath($table_dom);
$table_dom->preserveWhiteSpace = false;
$yt_link_node = $xpath->query("//table/tr/td[2]/a[2]");
foreach($yt_link_node as $yt_link){
$yt = $yt_link->getAttribute('href');
echo $title;
echo $yt;
}
?>
出于某种原因,它不起作用,我几乎应用了在 google 和 Whosebug 上找到的所有 xpath 查询。
标题反响很好,但不是 $yt
。
你能挑出我做错了什么吗?
这是因为 DOM 与您的预期略有不同。
您在那里解析的 HTML ($desc_table) 通常具有以下结构:
<table>
<tr>
<td>
<a href="https://www.reddit.com/r/videos/comments/...">
<img src="https://b.thumbs.redditmedia.com/....jpg"
alt="..." title="..." />
</a>
</td>
<td>   submitted by  
<a href="https://www.reddit.com/user/..."> /u/... </a>
<br/>
<span>
<a href="https://youtu.be/...">[link]</a>
</span>
 
<span>
<a href="https://www.reddit.com/r/videos/comments/.../">[comments]</a>
</span>
</td>
</tr>
</table>
因此没有第二个锚元素 (a
) 是第二个 td
元素的直接子元素,因为第二个(和第三个)锚包含在 span
标签。
所以如果你想达到这个 link:
<a href="https://youtu.be/...">[link]</a>
然后改用此 XPath:
$yt_link_node = $xpath->query("//table/tr/td[2]/span[1]/a");