看起来不像链接的链接
Links that doesn't look like links
我有一些代码可以获取页面的所有 link,但有些代码获取的 link 看起来不像 link。例如,索引 0-4 得到的 link 称为 "javascript:void(0)",而索引 5 得到一个只有“/”的空白 link。我该如何解决?谢谢。
$content = file_get_contents("http://bestspace.co"); //get content of page
$links = "<a\s[^>]*href=(\"??)([^\" >]*?)\1[^>]*>(.*)<\/a>"; //set regular expression to get links
preg_match_all("/$links/siU", $content, $matches); //get all links on page and store in array $matches[2]
print_r($matches[2]);
数组的内容
Array (
[0] => javascript:void(0)
[1] => javascript:void(0)
[2] => javascript:void(0)
[3] => javascript:void(0)
[4] => javascript:void(0)
[5] => /
[6] => /bestdeals
[7] => /about-us
[8] => /why-choose-us
[9] => /products
[10] => https://cloud.bestspace.co/clientarea.php
ect... );
使用 array_filter
删除所有 Javascript 链接。
$links = array_filter($matches[2], function($x) {
return substr($x, 0, 11) != 'javascript:';
});
我有一些代码可以获取页面的所有 link,但有些代码获取的 link 看起来不像 link。例如,索引 0-4 得到的 link 称为 "javascript:void(0)",而索引 5 得到一个只有“/”的空白 link。我该如何解决?谢谢。
$content = file_get_contents("http://bestspace.co"); //get content of page
$links = "<a\s[^>]*href=(\"??)([^\" >]*?)\1[^>]*>(.*)<\/a>"; //set regular expression to get links
preg_match_all("/$links/siU", $content, $matches); //get all links on page and store in array $matches[2]
print_r($matches[2]);
数组的内容
Array (
[0] => javascript:void(0)
[1] => javascript:void(0)
[2] => javascript:void(0)
[3] => javascript:void(0)
[4] => javascript:void(0)
[5] => /
[6] => /bestdeals
[7] => /about-us
[8] => /why-choose-us
[9] => /products
[10] => https://cloud.bestspace.co/clientarea.php
ect... );
使用 array_filter
删除所有 Javascript 链接。
$links = array_filter($matches[2], function($x) {
return substr($x, 0, 11) != 'javascript:';
});