PHP 从 url 中删除域名
PHP Strip domain name from url
我知道网上有很多关于这个主题的信息,但我似乎无法按照我想要的方式理解它。
我正在尝试构建一个从 url:
中删除域名的函数
http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
只需要域名的普通名称。
使用 parse_url 我过滤了域,但这还不够。
我有 3 个函数可以阻止域,但我仍然得到一些错误的输出
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
允许所有可能的域,url 和扩展名。函数完成后,它必须return一个只有域名本身的数组。
更新:
感谢所有的建议!
在大家的帮助下我弄明白了。
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
怎么样
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
啊,您的问题在于 TLD 可以分为一个或两个部分,例如 .com 与 .co.uk.
我要做的是维护一个 TLD 列表。根据 parse_url 之后的结果,遍历列表并查找匹配项。去掉 TLD,在 '.' 上展开最后一部分将采用您想要的格式。
这似乎没有达到应有的效率,但是,随着 TLD 一直在添加,我看不到任何其他确定性方法。
试试 preg_replace。
有点像
$domain = preg_replace($regex, '$1', $url);
regex
好的...这很乱,您应该花一些时间优化和缓存以前派生的域。您还应该有一个友好的名称服务器,最后一个问题是域必须在其 DNS 中有一个 "A" 记录。
这会尝试以相反的顺序 assemble 域名,直到它可以解析为 DNS "A" 记录。
无论如何,这让我很烦,所以我希望这个答案对您有所帮助:
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
我知道网上有很多关于这个主题的信息,但我似乎无法按照我想要的方式理解它。
我正在尝试构建一个从 url:
中删除域名的函数http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
只需要域名的普通名称。
使用 parse_url 我过滤了域,但这还不够。 我有 3 个函数可以阻止域,但我仍然得到一些错误的输出
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
允许所有可能的域,url 和扩展名。函数完成后,它必须return一个只有域名本身的数组。
更新: 感谢所有的建议!
在大家的帮助下我弄明白了。
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
怎么样
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
啊,您的问题在于 TLD 可以分为一个或两个部分,例如 .com 与 .co.uk.
我要做的是维护一个 TLD 列表。根据 parse_url 之后的结果,遍历列表并查找匹配项。去掉 TLD,在 '.' 上展开最后一部分将采用您想要的格式。
这似乎没有达到应有的效率,但是,随着 TLD 一直在添加,我看不到任何其他确定性方法。
试试 preg_replace。
有点像 $domain = preg_replace($regex, '$1', $url);
regex
好的...这很乱,您应该花一些时间优化和缓存以前派生的域。您还应该有一个友好的名称服务器,最后一个问题是域必须在其 DNS 中有一个 "A" 记录。
这会尝试以相反的顺序 assemble 域名,直到它可以解析为 DNS "A" 记录。
无论如何,这让我很烦,所以我希望这个答案对您有所帮助:
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}