从 url 获取域及其子域
Get domain with its subdomain from url
我正在使用此函数从字符串中获取域和子域。但是如果字符串已经是我期望的格式,它 returns null
function getDomainFromUrl($url) {
$host = parse_url($url, PHP_URL_HOST);
return preg_replace('/^www\./', '', $host);
}
$url = "http://abc.example.com/" -> abc.example.com | OK
$url = "http://www.example.com/" -> example.com | OK
$url = "abc.example.com" -> FAILS!
问题是 parse_url 返回 false。在尝试使用之前检查以确保您得到响应,否则 $host
为空。
<?php
function getDomainFromUrl($url) {
$host = (parse_url($url, PHP_URL_HOST) != '') ? parse_url($url, PHP_URL_HOST) : $url;
return preg_replace('/^www\./', '', $host);
}
echo getDomainFromUrl("http://abc.example.com/") . "\n";
echo getDomainFromUrl("http://www.example.com/") . "\n";
echo getDomainFromUrl("abc.example.com");
输出:
abc.example.com
example.com
abc.example.com
那是因为 abc.example.com
不是 PHP_URL_HOST
所以你需要先检查它是否是一个。所以你应该做一些像这样简单的事情,如果 url 没有协议 -> 添加它:
function addhttp($url) {
if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
$url = "http://" . $url;
}
return $url;
}
function getDomainFromUrl($url) {
$host = parse_url($url, PHP_URL_HOST);
if($host){
return preg_replace('/^www\./', '', $host);
}else{
//not a url with protocol
$url = addhttp($url); //add protocol
return getDomainFromUrl($url); //run function again.
}
}
parse_url() 函数不适用于相对 URL。您可以测试该方案是否存在,如果不存在则添加默认方案:
if ( !preg_match( '/^([^\:]+)\:\/\//', $url ) ) $url = 'http://' . $url;
这是一个纯正则表达式的解决方案:
function getDomainFromUrl($url) {
if (preg_match('/^(?:https?:\/\/)?(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?(?:www\.)?([^\/:]+)/', $url, $parts)) {
return $parts[1];
}
return false; // or maybe '', depending on what you need
}
getDomainFromUrl("http://abc.example.com/"); // abc.example.com
getDomainFromUrl("http://www.example.com/"); // example.com
getDomainFromUrl("abc.example.com"); // abc.example.com
getDomainFromUrl("username@abc.example.com"); // abc.example.com
getDomainFromUrl("https://username:password@abc.example.com"); // abc.example.com
getDomainFromUrl("https://username:password@abc.example.com:123"); // abc.example.com
你可以在这里试试:
http://sandbox.onlinephpfunctions.com/code/3f0343bbb68b190bffff5d568470681c00b0c45c
如果您想了解更多关于正则表达式的信息:
^ matching must start from the beginning on the string
(?:https?:\/\/)? an optional, non-capturing group that matches http:// and https://
(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?
an optional, non-capturing group that matches either *@ or *:*@ where * is any character
(?:www\.)? an optional, non-capturing group that matches www.
([^\/:]+) a capturing group that matches anything up until a '/', a ':', or the end of the string
我正在使用此函数从字符串中获取域和子域。但是如果字符串已经是我期望的格式,它 returns null
function getDomainFromUrl($url) {
$host = parse_url($url, PHP_URL_HOST);
return preg_replace('/^www\./', '', $host);
}
$url = "http://abc.example.com/" -> abc.example.com | OK
$url = "http://www.example.com/" -> example.com | OK
$url = "abc.example.com" -> FAILS!
问题是 parse_url 返回 false。在尝试使用之前检查以确保您得到响应,否则 $host
为空。
<?php
function getDomainFromUrl($url) {
$host = (parse_url($url, PHP_URL_HOST) != '') ? parse_url($url, PHP_URL_HOST) : $url;
return preg_replace('/^www\./', '', $host);
}
echo getDomainFromUrl("http://abc.example.com/") . "\n";
echo getDomainFromUrl("http://www.example.com/") . "\n";
echo getDomainFromUrl("abc.example.com");
输出:
abc.example.com
example.com
abc.example.com
那是因为 abc.example.com
不是 PHP_URL_HOST
所以你需要先检查它是否是一个。所以你应该做一些像这样简单的事情,如果 url 没有协议 -> 添加它:
function addhttp($url) {
if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
$url = "http://" . $url;
}
return $url;
}
function getDomainFromUrl($url) {
$host = parse_url($url, PHP_URL_HOST);
if($host){
return preg_replace('/^www\./', '', $host);
}else{
//not a url with protocol
$url = addhttp($url); //add protocol
return getDomainFromUrl($url); //run function again.
}
}
parse_url() 函数不适用于相对 URL。您可以测试该方案是否存在,如果不存在则添加默认方案:
if ( !preg_match( '/^([^\:]+)\:\/\//', $url ) ) $url = 'http://' . $url;
这是一个纯正则表达式的解决方案:
function getDomainFromUrl($url) {
if (preg_match('/^(?:https?:\/\/)?(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?(?:www\.)?([^\/:]+)/', $url, $parts)) {
return $parts[1];
}
return false; // or maybe '', depending on what you need
}
getDomainFromUrl("http://abc.example.com/"); // abc.example.com
getDomainFromUrl("http://www.example.com/"); // example.com
getDomainFromUrl("abc.example.com"); // abc.example.com
getDomainFromUrl("username@abc.example.com"); // abc.example.com
getDomainFromUrl("https://username:password@abc.example.com"); // abc.example.com
getDomainFromUrl("https://username:password@abc.example.com:123"); // abc.example.com
你可以在这里试试: http://sandbox.onlinephpfunctions.com/code/3f0343bbb68b190bffff5d568470681c00b0c45c
如果您想了解更多关于正则表达式的信息:
^ matching must start from the beginning on the string
(?:https?:\/\/)? an optional, non-capturing group that matches http:// and https://
(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?
an optional, non-capturing group that matches either *@ or *:*@ where * is any character
(?:www\.)? an optional, non-capturing group that matches www.
([^\/:]+) a capturing group that matches anything up until a '/', a ':', or the end of the string