在 Google Sheet 中提取 url 域根

Question

在 table 中，我有完整的 url 列表，例如：

Objective : 我只想提取 url 的域名部分 :

我使用的是以下公式：

=REGEXEXTRACT(A1;"^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+)")

正则表达式在 testing it 时工作正常：

https://www.example.com/

然而在 Google sheet 中，它显示为：

example.com

Answer 1

您可以通过删除捕获组（即此处 ([^:\/\n?]+) => [^:\/\n?]+）或将捕获组转换为 non-capturing 个（即 ([^:\/\n?]+) => (?:[^:\/\n?]+)):

=REGEXEXTRACT(A1;"^(?:https?://)?(?:[^@\n]+@)?(?:www\.)?[^:/\n?]+")
=REGEXEXTRACT(A1;"^(?:https?://)?(?:[^@\n]+@)?(?:www\.)?(?:[^:/\n?]+)")

注意:

请注意，您不需要转义 RE2 正则表达式中的 / 正斜杠，因为它们是在 Google 表格中借助字符串文字定义的。

模式可以简化为^(?:https?://)?[^:/\n?]+，可选地匹配http://或https://，然后匹配一个或多个除/、换行符或?.

Extract url domain root in Google Sheet