只能得到一个结果 Dom 爬虫

Question

试图获取 div id=firehoselist 中 h2 内的所有内容（获取文章标题），但以下代码仅 returns 第一个结果。有什么想法请

    $crawler = new Crawler($content);

    $crawler->filterXPath('//div[@id="firehoselist"]//*')->each(function (Crawler $node) use (&$results) {

        $results[] = trim($node->filter('h2')->text());

 });

我要抓取的内容太乱了 post 这里，但它来自 slashdot org 网站

Answer 1

//div[@id="firehoselist"] 正在寻找每个 ID 为 firehoselist 的元素，并且只会得到 first 此条目的结果 $node->filter('h2')->text().

你需要的是得到每个 #firehoselist h2的已解析html:

$crawler->filterXPath('//div[@id="firehoselist"]//h2')->each(function (Crawler $node) use (&$results) {

        $results[] = trim($node->text());

 });

只能得到一个结果 Dom 爬虫

Can only get one result Dom Crawler

dom

symfony

domcrawler