file_get_contents() 因 URL 中的特殊字符而失败

Question

我需要获取一些 URL，其中包含瑞典字母表中的一些字符。

如果您以 https://en.wikipedia.org/wiki/Åland_Islands 这样的字符串为例，将其作为参数直接传递到 file_get_contents 调用中就可以了。但是，如果您先运行 URL 到 urlencode，则调用失败并显示消息：

failed to open stream: No such file or directory

尽管 file_get_contents 的文档说：

Note: If you're opening a URI with special characters, such as spaces, you need to encode the URI with urlencode().

例如，如果您运行以下代码：

error_reporting(E_ALL);
ini_set("display_errors", true);

$url = urlencode("https://en.wikipedia.org/wiki/Åland_Islands");

$response = file_get_contents($url);
if($response === false) {
    die('file get contents has failed');
}
echo $response;

您将收到错误消息。如果您只是从代码中删除 "urlencode"，它将运行就好了。

我面临的问题是我的 URL 中有一个参数取自提交的表单。由于 PHP 总是运行s 通过 urlencode 提交值，我构造的 URL 中的瑞典语字符将导致错误发生。

我该如何解决这个问题？

Answer 1

问题可能是由于 urlencode 转义了您的协议：

https://en.wikipedia.org/wiki/Åland_Islands
https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%C3%85land_Islands

这是我也遇到过的问题，只能通过尝试将转义定位到转义所必需的内容来解决：

https://en.wikipedia.org/wiki/Åland_Islands
https://en.wikipedia.org/wiki/%C3%85land_Islands

根据你的角色所处的位置，这可以想象是棘手的。我通常选择编码补丁解决方案，但与我共事的一些人更喜欢只编码他们 url 的动态部分。

这是我的方法：

https://en.wikipedia.org/wiki/Åland_Islands
https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%C3%85land_Islands
https://en.wikipedia.org/wiki/%C3%85land_Islands

代码：

$url = 'https://en.wikipedia.org/wiki/Åland_Islands';
$encodedUrl = urlencode($url);
$fixedEncodedUrl = str_replace(['%2F', '%3A'], ['/', ':'], $encodedUrl);

希望对您有所帮助。

Answer 2

使用这个

$usableURL = mb_convert_encoding($url,'HTML-ENTITIES');

file_get_contents() 因 URL 中的特殊字符而失败

file_get_contents() fails with special characters in URL

php

file-get-contents