PHP：如何根据Javascript抓取网站内容

Question

我正在尝试使用 PHP simplehtmldom 库获取此网站的内容。

http://www.immigration.govt.nz/migrant/stream/work/workingholiday/czechwhs.htm

它不起作用，所以我尝试使用 CURL：

function curl_get_file_contents($URL)
{
    $c = curl_init();
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_URL, $URL);
    $contents = curl_exec($c);
    curl_close($c);

    if ($contents) return $contents;
    else return FALSE;
}

但总是只得到一些JS代码和内容的respose:

<noscript>Please enable JavaScript to view the page content.</noscript>

是否有可能使用 PHP 解决此问题？在这种情况下我必须使用 PHP 所以我需要模拟基于 JS 的浏览器。

非常感谢您的任何建议。

Answer 1

I must use PHP in this case so i need to simulate JS based browser.

我向您推荐两种方式：

利用v8js php plugin to deal with site's js when scraping. See here一个用法示例。
模拟基于 JS 的浏览器 通过使用 Selenium、iMacros 或 webRobots.io Chrome 分机。但在这种情况下，您已关闭 PHP 脚本。

PHP：如何根据Javascript抓取网站内容

PHP: How to scrape content of the website based on Javascript

javascript

php

curl

noscript

web-scraping