有没有办法绕过 php file_get_contents 的 403 错误？

Question

我正在尝试使用 php file_get_contents 获取特定网页 - 当我直接查看该页面时没有问题，但是当我尝试使用 php 获取它时我得到"failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden"。我正在尝试从页面中提取一段数据。

$ft = file_get_contents('https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000');

echo $ft;

我在这里的各个页面上阅读了有关使用 stream_context_create 的内容，主要是用户代理部分

$context  = stream_context_create(
array(
    "http" => array(
        "header" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
    )
)

);

但没有任何效果，我现在收到 400 错误消息。不幸的是，我的服务器似乎没有配置为使用 cURL，因此 file_get_contents 似乎是我执行此操作的唯一方法。

Answer 1

您需要将 User-Agent header 添加到实际的 header:

$context  = stream_context_create(
  array(
    'http' => array(
      'header' => 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
    ),
));

您也可以使用 user_agent 选项：

$context = stream_context_create(
  array(
    'http' => array(
      'user_agent' => 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
    ),
));

以上两个示例都应该有效，您现在应该能够使用以下方法获取内容：

$content = file_get_contents('https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000', false, $context);

echo $content;

这当然也可以从命令行使用 curl 进行测试。请注意，我们正在设置自己的 User-Agent header:

curl --verbose -H 'User-Agent: YourApplication/1.0' 'https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000'

可能还需要知道 curl 使用的默认 User-Agent 似乎已被阻止，因此如果使用 curl，您需要使用 -H 标志添加自己的。

Answer 2

Vesselfinder，您向其发出请求的服务，似乎拒绝自动解析其数据，正如@ADyson 所说。阅读文档：https://www.vesselfinder.com/de/realtime-ais-data#rt-web-services 您可以向他们索取 API 令牌，也许这是付费计划。

他们有官方API。您需要一个 Api 密钥。

有没有办法绕过 php file_get_contents 的 403 错误？

Is there a way to get round a 403 error with php file_get_contents?

php

file-get-contents