Xpath return 比浏览器中的 XPath 帮助程序输出错误

Question

我正在使用 XPath 助手来创建我的路径，但是我第一次得到一个完全错误的输出。我创建了以下路径来获取当天的文章链接。只是为了测试我已经硬编码了当前日期。

//b[contains(., '22/4 - 2015')]/parent::div/following-sibling::div[@class='newsItem']

不是像在 XPath Helper 中那样返回每个 newsItem，而是 returns 整个页面？怎么会这样。这是我的代码

function scrape() {
    $hltv = file_get_html("http://www.hltv.org/?pageid=96");
    foreach($hltv->find("//b[contains(., '22/4 - 2015')]/parent::div/following-sibling::div[@class='newsItem']") as $hltv_element) {
        echo $hltv_element;
    }

}

Answer 1

尚不完全清楚您希望得到什么结果，但这里有一段相关的 HTML 希望能让您更清楚：

<div style="margin-bottom:5px;margin-top:5px;">
                <b>22/4 - 2015</b>
            </div>
            <div class="newsItem">
                <a href="/news/14794-video-pyth-vs-dignitas" id="newsitem14794" title="Video: pyth vs. dignitas">
                    <span style="float:left;">
                        <img style="vertical-align: 1px;" src="http://static.hltv.org//images/mod_csgo.png" title="Counter-Strike: Global Offensive"/>
                        <img src="http://static.hltv.org//images/flag/se.gif" alt="" />&nbsp;</span> <span style="float:left;cursor: hand;width:350px;color:#000000"/>
                        <b>Video: pyth vs. dignitas</b>
                    </span>
                </a>
                <span style="float: right;">(22)</span>
            </div>
            <div style="clear:both"></div>
            <div class="newsItem"><a href="/news/14795-video-keev-vs-myxmg" id="newsitem14795" title="Video: keev vs. myXMG">
                <span style="float:left;">
                    <img style="vertical-align: 1px;" src="http://static.hltv.org//images/mod_csgo.png" title="Counter-Strike: Global Offensive"/>

如您所见，有一个 <b>22/4 - 2015</b> 被选中。但是它的父级，即代码段中的第一个 div，在 @class="newsItem" 的 div 之后有多个兄弟姐妹。或许你打算

//b[contains(., '22/4 - 2015')]/parent::div/following-sibling::div[@class='newsItem'][1]

is simple html dom using an old version of XPath or?

在我看来，所有名称中包含 "simple" 的库（SimpleXML、Simple HTML DOM）都不是那么简单，而且经常会出问题。所有库都使用 XPath 1.0，所以这不是问题所在。你最好使用 DOMDocument 和 DomXPath.

编辑

just to be clear: I want to get the titles of the news on the current date

然后使用

//b[contains(., '22/4 - 2015')]/parent::div/following-sibling::div[@class='newsItem'][1]/a/@title

Xpath return 比浏览器中的 XPath 帮助程序输出错误

Xpath return wrong output than in XPath helper in browser

php

xpath

simple-html-dom

web-scraping