失败子域的正则表达式

Question

基本上，我想检查一个没有子域的有效 URL。我似乎无法找出正确的正则表达式。

URL 应该匹配的示例：

example.com
www.example.com
example.co.uk
示例。com/page
example.com?键=值

不应匹配的 URL 示例：

测试.example.com
sub.test.example.com

Answer 1

在这里，我们将从一个表达式开始，该表达式在右侧与 .com 或 .co.uk 和其他边界相连，如果需要，然后我们将向左滑动以收集所有 non-dot 字符，添加一个可选的 www 和 https，然后我们将添加一个起始字符 ^，这将使所有子域失败：

^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$

可以将其他 TLD 添加到此捕获组：

(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)

并且表达式可以修改为：

^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)(.+|)$

灵活性

我想不出让 TLD 过于灵活的方法，因为这是一个验证表达式。例如，如果我们将其简化为：

^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$

它可能适用于问题中列出的 URLs，但它也会通过：

example.example

这是无效的。我们只能使用这个表达式：

^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$

如果我们知道我们传递的是什么，它已经是一个URL。

NOT FUNCTIONAL DEMO

演示

这段代码只是展示了捕获组的工作原理：

const regex = /^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$/gm;
const str = `example.com
www.example.com
example.co.uk
example.com/page
example.com?key=value

test.example.com
sub.test.example.com`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

正则表达式电路

jex.im 可视化正则表达式：

正则表达式

如果不需要这个表达式，它可以是 regex101.com 中的 modified/changed。

失败子域的正则表达式

RegEx for failing subdomains

regex

iis

regex-group

regex-greedy

regex-lookarounds

灵活性

NOT FUNCTIONAL DEMO

演示

正则表达式电路

正则表达式

DEMO