使用正则表达式从 URL 中提取二级域名

Question

Update. 假设域名是主机名的最后两项，除了第二项是 co 或 com，在这种情况下，域名是最后三项。如果只有一项->就是域名。

最少要处理的情况：

http://google.com          -> google.com
http://www.google.com      -> google.com
http://abc.cde.google.com  -> google.com
http://google.co.uk        -> google.co.uk
http://www.google.com.au   -> google.com.au
http://www.mysite.info     -> mysite.info
http://www.mysite.business -> mysite.business
http://localhost           -> localhost

这个问题的正则表达式沙箱

这是测试和一些起始正则表达式 https://regex101.com/r/AyuW88/3

作为奖励， 几个案例（但如果正则表达式只适用于前一个案例，我会很高兴）

http://google.com:8080      -> google.com
http://www.google.com?q=abc -> google.com
http://www.google.com/smth  -> google.com

Answer 1

此正则表达式应该可以解决您的用例。

正则表达式：(?<=http(s)?:\/\/).*

解释：
(?<=http(s)?:\/\/)：正向回溯，看word是http还是https。
.*: 之后会捕获所有内容。

Link: https://regex101.com/r/fX1fI5/130

希望对您有所帮助。

Answer 2

这应该适用于您的简单案例：

 r'([^\/\.]+\.(com|co)\.\w+|[^\/\.]+.\w+)$'

在第 1 组中捕获。您的假设 "except the second is co or com" 已硬编码在正则表达式中。另外，一行有错别字：

http://www.google.com.au   -> google.com.ua

应该是"google.com.au"

使用正则表达式从 URL 中提取二级域名

Extract Second Level Domain from URL with RegEx

regex

url

tld