编写没有否定的正则表达式

Question

在之前的中，我曾寻求一些关于重写正则表达式而不取反的帮助

开始正则表达式：

https?:\/\/(?:.(?!https?:\/\/))+$

最后：

https?:[^:]*$

这很好用，但我注意到如果我的 URL 中有 : 除了 http\s 中的 :，它不会 select.

这是一个不起作用的字符串：

sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2

你可以注意到 :query2

我如何修改此处列出的第二个正则表达式，使其 select 包含 :.

的网址

预期输出：

http://websites.com/path/subpath/cc:query2

我也想 select 一切直到第一次出现 ?=param

输入： sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param

输出：

http://websites.com/path/subpath/cc:query2/text/

Answer 1

遗憾的是Go regex不支持lookarounds。但是，您可以通过一种技巧获得最后一个 link：贪婪地匹配所有可能的 link 和其他字符，并使用捕获组捕获最后一个 link：

^(?:https?://|.)*(https?://\S+?)(?:\?=|$)

连同 \S*? 惰性空格匹配，这也可以捕获 link 到 ?=。

见regex demo and Go demo

var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])

结果：

"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"

如果最后一个 link 可以有空格，只使用 .+?:

^(?:https?://|.)*(https?://.+?)(?:\?=|$)

编写没有否定的正则表达式

Write regex without negations

regex

go

regex-negation