从 url 获取域及其子域

Get domain with its subdomain from url

我正在使用此函数从字符串中获取域和子域。但是如果字符串已经是我期望的格式,它 returns null

function getDomainFromUrl($url) {
    $host = parse_url($url, PHP_URL_HOST);
    return preg_replace('/^www\./', '', $host);
}

$url = "http://abc.example.com/" -> abc.example.com | OK

$url = "http://www.example.com/" -> example.com | OK

$url = "abc.example.com" -> FAILS!

问题是 parse_url 返回 false。在尝试使用之前检查以确保您得到响应,否则 $host 为空。

<?php
function getDomainFromUrl($url) {
    $host = (parse_url($url, PHP_URL_HOST) != '') ? parse_url($url, PHP_URL_HOST) : $url;
    return preg_replace('/^www\./', '', $host);
}
echo getDomainFromUrl("http://abc.example.com/") . "\n";
echo getDomainFromUrl("http://www.example.com/") . "\n";
echo getDomainFromUrl("abc.example.com");

输出:

abc.example.com
example.com
abc.example.com

那是因为 abc.example.com 不是 PHP_URL_HOST 所以你需要先检查它是否是一个。所以你应该做一些像这样简单的事情,如果 url 没有协议 -> 添加它:

function addhttp($url) {
    if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
        $url = "http://" . $url;
    }
    return $url;
}

function getDomainFromUrl($url) {
    $host = parse_url($url, PHP_URL_HOST);
    if($host){
        return preg_replace('/^www\./', '', $host);
    }else{
        //not a url with protocol
        $url = addhttp($url); //add protocol
        return getDomainFromUrl($url); //run function again.
    }
}

parse_url() 函数不适用于相对 URL。您可以测试该方案是否存在,如果不存在则添加默认方案:

if ( !preg_match( '/^([^\:]+)\:\/\//', $url ) ) $url = 'http://' . $url;

这是一个纯正则表达式的解决方案:

function getDomainFromUrl($url) {
    if (preg_match('/^(?:https?:\/\/)?(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?(?:www\.)?([^\/:]+)/', $url, $parts)) {
        return $parts[1];
    }
    return false; // or maybe '', depending on what you need
}

getDomainFromUrl("http://abc.example.com/"); // abc.example.com

getDomainFromUrl("http://www.example.com/"); // example.com

getDomainFromUrl("abc.example.com");         // abc.example.com

getDomainFromUrl("username@abc.example.com"); // abc.example.com

getDomainFromUrl("https://username:password@abc.example.com"); // abc.example.com

getDomainFromUrl("https://username:password@abc.example.com:123"); // abc.example.com

你可以在这里试试: http://sandbox.onlinephpfunctions.com/code/3f0343bbb68b190bffff5d568470681c00b0c45c

如果您想了解更多关于正则表达式的信息:

^                 matching must start from the beginning on the string
(?:https?:\/\/)?  an optional, non-capturing group that matches http:// and https://

(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?
                  an optional, non-capturing group that matches either *@ or *:*@ where * is any character
(?:www\.)?        an optional, non-capturing group that matches www.
([^\/:]+)          a capturing group that matches anything up until a '/', a ':', or the end of the string