捕获 url 前缀但排除 www 的正则表达式

Question

我一直在努力思考 javascript 中正则表达式的用法（不是专家），但我一直无法解决这个问题。

这是我的模式 url:

https://www.prefix.site.com

我当前的正则表达式：

/(?:(\w+)\.)?site\.com

我需要做的是捕获“.site”之前的前缀，但我不想包含“https://www.', given that both 'www.' and my prefix may or may not be present. An example of my prefix could be an environment, e.g. https://testing.site.com

”

上面的正则表达式的问题是，如果 'www.' 没有我的前缀，那么它将捕获 'www.' 作为前缀，而这不是我需要的。

我有点用负面回顾解决了它，但由于它在 javascript 中不可用，我无法使用它。

如有任何提示，我们将不胜感激！

Answer 1

听起来以下内容适合您：

https?://(?:w{3}\.)?(\w+)\.site\.com

Answer 2

在捕获组的最开始，您可以对 www. 进行否定前瞻，以确保捕获组仅在包含 www. 以外的内容时才会匹配：

((?!www\.)\b\w+\.)?site\.com

https://regex101.com/r/K8btgd/1

注意单词边界 \b - 这是为了确保捕获组在非单词字符（如 / 或 .）之后开始，或者 won ' 完全匹配（以防止诸如 ww.site.com 之类的匹配，其中第三个 w 在它之前）

Answer 3

根据您的需要，此表达式将仅捕获前缀：(?!w{1,3}\.)[\w-]+(?=\.example)

https://regex101.com/r/X4L9ZZ/2

它支持破折号，并在 prefix/sub-domain 中正确允许使用 "w"。

样本：

const getPrefix = uri => {
  const matched = uri.match(/(?!w{1,3}\.)[\w-]+(?=\.example)/);
  return matched && matched[0];
}

getPrefix("https://www.prefix.example.com"); // "prefix"
getPrefix("https://prefix.example.com"); // "prefix"
getPrefix("https://www.example.com"); // null
getPrefix("https://example.com"); // null

好消息是 "lookbehinds" 将很快在 JS 中得到全面支持。它已经在第 4 阶段，只需要跨浏览器实现！ https://github.com/tc39/proposal-regexp-lookbehind

捕获 url 前缀但排除 www 的正则表达式

Regex which captures url prefix but excludes www

javascript

regex

lookaround