如何使用file_get_contents找到'a'点击'a'获取里面的内容

Question

我正在制作一个从 pakwheels.com 获取数据的爬虫，我能够从该网站获取数据

<?php 

    for ($y = 1; $y <= 5; $y++) {
        $pakwheels = file_get_contents('http://www.pakwheels.com/used-cars/search/-/?page=' . $y . '');
        $file2 = 'pakwheels.txt';
        file_put_contents($file2 , $pakwheels, FILE_APPEND);
    } 

?>

但是要求改变了，现在我想先从 http://www.pakwheels.com/used-cars/search which I am already doing. The problem is that I want a logic that when I get contents from first page then it will click on the href a links of the ads(title) listed in listview and use file get contents to save the whole content of the ads then return back on home page i.e http://www.pakwheels.com/used-cars/search?page=1 并检索第二个广告，依此类推。

我也在做 ajax a.clicked 功能，但我无法实现结果。

如果您想了解更多信息。我也会提供这些。

Answer 1

为此使用 PHP cURL and PHP DOMDocument：

libxml_use_internal_errors(true);
for ($y = 1; $y <= 5; $y++) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_URL, 'http://www.pakwheels.com/used-cars/search/-/?page=' . $y);
    $searchResults = curl_exec($ch);

    // save $searchResults here to a file or use DOMDocument to filter what you need

    $doc = new DOMDocument();
    $doc->loadHTML($searchResults);
    $links = $doc->getElementsByTagName('a');
    foreach($links as $link) {
        if($link->getAttribute('class') === 'car-name') {
            curl_setopt($ch, CURLOPT_URL, 'http://www.pakwheels.com' . $link->getAttribute('href'));
            $details = curl_exec($ch);

            // save $details here to a file or use DOMDocument to filter what you need

        }
    }
    curl_close($ch);
}

如果您需要进一步的解释，请随时询问 ;-)

如何使用file_get_contents找到'a'点击'a'获取里面的内容

How to use file_get_contents to find 'a' and click 'a' to get inner contents

php

ajax

web-crawler