PHP get_headers() 因 Pinterest 而失败
PHP get_headers() fails with Pinterest
我目前正在开发一种工具来集成 link 不同的社交网络:
Facebook: https://www.facebook.com/jonathan.parentlevesque
Google plus: https://plus.google.com/+JonathanParentL%C3%A9vesque
Instagram: https://instagram.com/mariloubiz/
Pinterest: https://www.pinterest.com/jonathan_parl/
RSS: https://regex101.com
Twitter: https://twitter.com/arcadefire
Vimeo: https://vimeo.com/ondemand/crashtest/135301838
Youtube: https://www.youtube.com/user/Darkjo666
我正在使用非常基本的正则表达式,例如:
/^https?:\/\/(?:[a-z]{2}|[w]{3})?\.pinterest.com\/[\S]{5,}$/i
在客户端和服务器端对每个 link 进行最少的域验证。
然后,我正在使用这个函数来验证该页面是否真的存在(集成社交网络link终究不起作用):
public static function isUrlExists($url){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$headers = get_headers($url);
if ($headers !== false and !empty($headers)){
if (strpos($headers[0], '404') === false){
$exists = true;
}
}
}
return $exists;
}
注意:在这个函数中,我使用 Diego Perini 的正则表达式在发送请求之前验证 URL:
const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini
到目前为止所有测试的 links 都没有产生任何错误,但是测试 Pinterest 给我带来了这一系列非常可怕的错误消息:
get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(): Failed to enable crypto
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
有人知道我做错了什么吗?
我的意思是,Pinterest 不是一个具有有效证书的流行社交网络吗(我个人不使用它,我只是创建了一个用于测试的帐户)?
感谢您的帮助,
来自蒙特利尔的 Jonathan Parent-Lévesque
我尝试按照 N.B 的建议为我的开发环境 (Xampp) 创建一个 self-signed 证书。在他的评论中。该解决方案对我不起作用。
他的另一个解决方案是使用 cUrl 或 guzzle 而不是 get_headers()。它不仅有效,而且根据开发者的测试:
http://php.net/manual/fr/function.get-headers.php#104723
它也比 get_headers() 快得多。
对于那些感兴趣的人,这是我为那些感兴趣的人提供的新功能的代码:
/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of codes for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_USERAGENT, true);
$headers = curl_exec($handle);
curl_close($handle);
if (empty($failCodeList) or !is_array($failCodeList)){
$failCodeList = array(404);
}
if (!empty($headers)){
$exists = true;
$headers = explode(PHP_EOL, $headers);
foreach($failCodeList as $code){
if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){
$exists = false;
break;
}
}
}
}
return $exists;
}
让我解释一下 curl 选项:
CURLOPT_RETURNTRANSFER: return一个字符串而不是在屏幕上显示调用页面。
CURLOPT_SSL_VERIFYPEER: cUrl 不会签出证书
CURLOPT_HEADER:在字符串
中包含header
CURLOPT_NOBODY:不要在字符串
中包含body
CURLOPT_USERAGENT:某些站点需要它才能正常运行(例如:https://plus.google.com)
附加说明:我分解了 header 字符串和用户 headers[0] 以确保只验证 return 代码和消息(示例:200、404、405 等)
附加说明 2:有时仅验证代码 404 是不够的(请参阅单元测试),因此有一个可选的 $failCodeList 参数来提供要拒绝的所有代码列表.
当然,这是使我的编码合法化的单元测试:
public function testIsUrlExists(){
//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));
$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));
$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));
$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));
$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));
$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));
$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));
//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));
$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));
$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));
$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}
我希望这个解决方案对某人有所帮助,
来自蒙特利尔的 Jonathan Parent-Lévesque
我目前正在开发一种工具来集成 link 不同的社交网络:
Facebook: https://www.facebook.com/jonathan.parentlevesque
Google plus: https://plus.google.com/+JonathanParentL%C3%A9vesque
Instagram: https://instagram.com/mariloubiz/
Pinterest: https://www.pinterest.com/jonathan_parl/
RSS: https://regex101.com
Twitter: https://twitter.com/arcadefire
Vimeo: https://vimeo.com/ondemand/crashtest/135301838
Youtube: https://www.youtube.com/user/Darkjo666
我正在使用非常基本的正则表达式,例如:
/^https?:\/\/(?:[a-z]{2}|[w]{3})?\.pinterest.com\/[\S]{5,}$/i
在客户端和服务器端对每个 link 进行最少的域验证。
然后,我正在使用这个函数来验证该页面是否真的存在(集成社交网络link终究不起作用):
public static function isUrlExists($url){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$headers = get_headers($url);
if ($headers !== false and !empty($headers)){
if (strpos($headers[0], '404') === false){
$exists = true;
}
}
}
return $exists;
}
注意:在这个函数中,我使用 Diego Perini 的正则表达式在发送请求之前验证 URL:
const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini
到目前为止所有测试的 links 都没有产生任何错误,但是测试 Pinterest 给我带来了这一系列非常可怕的错误消息:
get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(): Failed to enable crypto
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed
Array
(
[url] => https://www.pinterest.com/jonathan_parl/
[exists] =>
)
有人知道我做错了什么吗?
我的意思是,Pinterest 不是一个具有有效证书的流行社交网络吗(我个人不使用它,我只是创建了一个用于测试的帐户)?
感谢您的帮助,
来自蒙特利尔的 Jonathan Parent-Lévesque
我尝试按照 N.B 的建议为我的开发环境 (Xampp) 创建一个 self-signed 证书。在他的评论中。该解决方案对我不起作用。
他的另一个解决方案是使用 cUrl 或 guzzle 而不是 get_headers()。它不仅有效,而且根据开发者的测试:
http://php.net/manual/fr/function.get-headers.php#104723
它也比 get_headers() 快得多。
对于那些感兴趣的人,这是我为那些感兴趣的人提供的新功能的代码:
/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of codes for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){
$exists = false;
if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
$url = "https://" . $url;
}
if (preg_match(RegularExpression::URL, $url)){
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_setopt($handle, CURLOPT_USERAGENT, true);
$headers = curl_exec($handle);
curl_close($handle);
if (empty($failCodeList) or !is_array($failCodeList)){
$failCodeList = array(404);
}
if (!empty($headers)){
$exists = true;
$headers = explode(PHP_EOL, $headers);
foreach($failCodeList as $code){
if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){
$exists = false;
break;
}
}
}
}
return $exists;
}
让我解释一下 curl 选项:
CURLOPT_RETURNTRANSFER: return一个字符串而不是在屏幕上显示调用页面。
CURLOPT_SSL_VERIFYPEER: cUrl 不会签出证书
CURLOPT_HEADER:在字符串
中包含headerCURLOPT_NOBODY:不要在字符串
中包含bodyCURLOPT_USERAGENT:某些站点需要它才能正常运行(例如:https://plus.google.com)
附加说明:我分解了 header 字符串和用户 headers[0] 以确保只验证 return 代码和消息(示例:200、404、405 等)
附加说明 2:有时仅验证代码 404 是不够的(请参阅单元测试),因此有一个可选的 $failCodeList 参数来提供要拒绝的所有代码列表.
当然,这是使我的编码合法化的单元测试:
public function testIsUrlExists(){
//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));
$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));
$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));
$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));
$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));
$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));
$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));
$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));
//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));
$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));
$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));
$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));
$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}
我希望这个解决方案对某人有所帮助,
来自蒙特利尔的 Jonathan Parent-Lévesque