使用简单的 html dom 从网页中提取值

Question

我在网上搜索并找到了使用简单 html dom 提取数据的方法，但它给了我以下错误：

Warning: file_get_contents(http://www.flipkart.com/moto-g-2nd-gen/p/itme6g3wferghmv3): failed to open stream: HTTP request failed! HTTP/1.1 500 Server Error in C:\Users\Abhishek\Desktop\editor\request\simple_html_dom.php on line 75

Fatal error: Call to a member function find() on boolean in C:\Users\Abhishek\Desktop\editor\request\main.php on line 9

我为它设计的 php 代码是：

<?php 

include('simple_html_dom.php');

$html = file_get_html('http://www.flipkart.com/moto-g-2nd-gen/p/itme6g3wferghmv3');


foreach($html->find('span.selling-price.omniture-field') as $e)
    echo $e->outertext . '<br>';

?>

我是这个编程的新手，知识不够，但是我的程序有什么错误吗？

Answer 1

确保 fopen wrappers are enabled to do this.. From the manual:

A URL can be used as a filename with this function if the fopen wrappers have been enabled.

由于此功能被禁用 file_get_contents() returns false 这会导致您出现第二个错误。

Answer 2

服务器可能根据 User-Agent 拒绝了您的请求，请尝试使用 curl 获取页面 html，即

<?php
$url="http://www.flipkart.com/moto-g-2nd-gen/p/itme6g3wferghmv3";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, "User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_ENCODING, "");
$pagebody=curl_exec($ch);
curl_close ($ch);

include('simple_html_dom.php');
$html = str_get_html($pagebody);

foreach($html->find('.selling-price') as $e)
    echo $e->outertext . '<br>';

输出：

卢比。 10,999

注：

我可以确认服务器根据 User-Agent 拒绝了您的请求。

使用简单的 html dom 从网页中提取值

Extracting the value from webpage using simple html dom

html

php

file-get-contents

web-scraping

输出：

注：